FastHugs: Sequence Classification with Transformers and Fastai
Fine-tune a text classification model with HuggingFace š¤ transformers and fastai-v2.
All FastHugs code can be found in my FastHugs GitHub
Things You Might Like (❤️ ?)
FastHugsTokenizer: A tokenizer wrapper than can be used with fastai-v2's tokenizer.
FastHugsModel: A model wrapper over the HF models, more or less the same to the wrapper's from HF fastai-v1 articles mentioned below
Padding: Padding settings for the padding token index and on whether the transformer prefers left or right padding
Model Splitters: Functions to split the classification head from the model backbone in line with fastai-v2's new definition of Learner
(in splitters.py
Housekeeping
Pretrained Transformers only for now 😐
Initially, this notebook will only deal with finetuning HuggingFace's pretrained models. It covers BERT, DistilBERT, RoBERTa and ALBERT pretrained classification models only. These are the core transformer model architectures where HuggingFace have added a classification head. HuggingFace also has other versions of these model architectures such as the core model architecture and language model model architectures.
If you'd like to try train a model from scratch HuggingFace just recently published an article on How to train a new language model from scratch using Transformers and Tokenizers. Its well worth reading to see how their tokenizers
library can be used, independent of their pretrained transformer models.
Read these first 👇
This notebooks heavily borrows from this notebook , which in turn is based off of this tutorial and accompanying article. Huge thanks to Melissa Rajaram and Maximilien Roberti for these great resources, if you're not familiar with the HuggingFace library please given them a read first as they are quite comprehensive.
fastai-v2 ✌️2️⃣
This paper introduces the v2 version of the fastai library and you can follow and contribute to v2's progress on the forums. This notebook uses the small IMDB dataset and is based off the fastai-v2 ULMFiT tutorial. Huge thanks to Jeremy, Sylvain, Rachel and the fastai community for making this library what it is. I'm super excited about the additinal flexibility v2 brings. š
Dependencies 📥
If you haven't already, install HuggingFace's transformers
library with: pip install transformers
#collapse
path = untar_data(URLs.IMDB_SAMPLE)
model_path = Path('models')
df = pd.read_csv(path/'texts.csv')
#collapse
class FastHugsTokenizer():
"""
transformer_tokenizer : takes the tokenizer that has been loaded from the tokenizer class
model_name : model type set by the user
max_seq_len : override default sequence length, typically 512 for bert-like models
sentence_pair : whether a single sentence (sequence) or pair of sentences are used
"""
def __init__(self, transformer_tokenizer=None, model_name = 'roberta', max_seq_len=None,
sentence_pair=False, **kwargs):
self.tok, self.max_seq_len=transformer_tokenizer, max_seq_len
if self.max_seq_len:
if self.max_seq_len<=self.tok.max_len:
print('WARNING: max_seq_len is larger than the model default transformer_tokenizer.max_len')
if sentence_pair: self.max_seq_len=ifnone(max_seq_len, self.tok.max_len_sentences_pair)
else: self.max_seq_len=ifnone(max_seq_len, self.tok.max_len_single_sentence)
self.model_name = model_name
def do_tokenize(self, o:str):
"""Limits the maximum sequence length and add the special tokens"""
CLS, SEP=self.tok.cls_token, self.tok.sep_token
# Add prefix space, depending on model selected
if 'roberta' in model_name: tokens=self.tok.tokenize(o, add_prefix_space=True)[:self.max_seq_len]
else: tokens = self.tok.tokenize(o)[:self.max_seq_len]
# order of 'tokens', 'SEP' and 'CLS'
if 'xlnet' in model_name: return tokens + [SEP] + [CLS]
else: return [CLS] + tokens + [SEP]
def __call__(self, items):
for o in items: yield self.do_tokenize(o)
FastHugs Model
This nn.module
wraps the pretrained transformer model and initialises it with its config file.
The forward
of this module is taken straight from Melissa's notebook above and its purpose is to create the attention mask and grab only the logits from the output of the model (as the HappyFace transformer models also output the loss).
#collapse
class FastHugsModel(nn.Module):
'Inspired by https://www.kaggle.com/melissarajaram/roberta-fastai-huggingface-transformers/data'
def __init__(self, transformer_cls, config_dict, n_class, pretrained=True):
super(FastHugsModel, self).__init__()
self.config = config_dict
self.config._num_labels = n_class
# load model
if pretrained: self.transformer = transformer_cls.from_pretrained(model_name, config=self.config)
else: self.transformer = transformer_cls.from_config(config=self.config)
def forward(self, input_ids, attention_mask=None):
attention_mask = (input_ids!=1).type(input_ids.type())
logits = self.transformer(input_ids, attention_mask = attention_mask)[0]
return logits
The HuggingFace bit
Define HuggingFace Model + Config
AutoModelForSequenceClassification
will define our model. When this is padded to theFastHugsModel
class below then model will be instantiated and the weights downloaded (if you are using a pretrained model)AutoConfig
will define the model architecture and settingsmodel_name
is the model architecture (and optionally model weights) you'd like to use.- Models tested:
bert-base-uncased
,roberta-base
,distilbert-base-cased
,albert-base-v2
- You can find all of HuggingFace's models at https://huggingface.co/models, although not all of them are supported by
AutoModel
,AutoConfig
andAutoTokenizer
- Models tested:
model_name = 'roberta-base'
model_class = AutoModelForSequenceClassification
config_dict = AutoConfig.from_pretrained(model_name)
HuggingFace Config changes
Some config settings can be changed even when using pretrained weights. For example in the FastHugsModel
class below _num_labels
is set when the model (pretrained or not) is instantiated, depending on how many classes you have in your dataloader.
When creating a non-pretrained model you can load a config with:
config_dict = AutoConfig.for_model(model_name)
Alternatively you could load a pretrained config and modify that. For example if your are not using a pretrained model you can change the size of your input embeddings by changing config_dict.max_position_embeddings = 1024
. (This won't work when using pretrained models as the pre-trained weights need the default max_position_embeddings
size).
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer_vocab=tokenizer.get_vocab()
tokenizer_vocab_ls = [k for k, v in sorted(tokenizer_vocab.items(), key=lambda item: item[1])]
len(tokenizer_vocab_ls)
splitter_nm = model_name.split('-')[0] + '_cls_splitter'
model_splitter = splitters[splitter_nm]
fasthugstok
and our tok_fn
Lets incorporate the tokenizer
from HuggingFace into fastai-v2's framework by specifying a function called fasthugstok
that we can then pass on to Tokenizer.from_df
. (Note .from_df
is the only method I have tested)
Max Seqence Length
max_seq_len
is the longest sequece our tokenizer will output. We can also the max sequence length for the tokenizer by changing max_seq_len
. It uses the tokenizer's default, typically 512
. 1024
or even 2048
can also be used depending on your GPU memory. Note when using pretrained models you won't be able to use a max_seq_len
larger than the default.
max_seq_len = None
sentence_pair=False
fasthugstok = partial(FastHugsTokenizer, transformer_tokenizer=tokenizer, model_name=model_name,
max_seq_len=max_seq_len, sentence_pair=sentence_pair)
Set up fastai's Tokenizer.from_df
, we pass rules=[]
to override fastai's default text processing rules
fastai_tokenizer = Tokenizer.from_df(text_cols='text', res_col_name='text', tok_func=fasthugstok, rules=[])
splits = ColSplitter()(df)
x_tfms = [attrgetter("text"), fastai_tokenizer, Numericalize(vocab=tokenizer_vocab_ls)]
dsets = Datasets(df, splits=splits, tfms=[x_tfms, [attrgetter("label"), Categorize()]], dl_type=SortedDL)
#collapse
def transformer_padding(tokenizer=None, max_seq_len=None, sentence_pair=False):
if tokenizer.padding_side == 'right': pad_first=False
else: pad_first=True
max_seq_len = ifnone(max_seq_len, tokenizer.max_len)
return partial(pad_input_chunk, pad_first=pad_first, pad_idx=tokenizer.pad_token_id, seq_len=max_seq_len)
bs = 4
padding=transformer_padding(tokenizer)
dls = dsets.dataloaders(bs=bs, before_batch=[padding])
dls.show_batch(max_n=3, trunc_at=60)
fct_dls = TextDataLoaders.from_df(df, text_col="text", tok_tfm=fastai_tokenizer, text_vocab=tokenizer_vocab_ls,
before_batch=[padding], label_col='label', valid_col='is_valid', bs=bs)
fct_dls.show_batch(max_n=3, trunc_at=60)
opt_func = partial(Adam, decouple_wd=True)
loss = LabelSmoothingCrossEntropy()
fasthugs_model = FastHugsModel(transformer_cls=model_class, config_dict=config_dict, n_class=dls.c, pretrained=True)
learn = Learner(dls, fasthugs_model, opt_func=opt_func, splitter=model_splitter,
loss_func=loss, metrics=[accuracy]).to_fp16()
learn.freeze_to(1)
Lets find a learning rate to train our classifier head
learn.lr_find(suggestions=True)
learn.recorder.plot_lr_find()
plt.vlines(9.999e-7, 0.65, 1.1)
plt.vlines(0.10, 0.65, 1.1)
learn.fit_one_cycle(3, lr_max=1e-3)
learn.save('roberta-fasthugs-stg1-1e-3')
learn.recorder.plot_loss()
learn.unfreeze()
learn.lr_find(suggestions=True)
learn.recorder.plot_lr_find()
plt.vlines(6.30e-8, 0.6, 1.2)
plt.vlines(0.039, 0.6, 1.2)
learn.fit_one_cycle(3, lr_max=slice(1e-5, 1e-4))
learn.save('roberta-fasthugs-stg2-3e-5')
learn.recorder.plot_loss()
learn.predict("This was a really good movie, i loved it")
from fastai2.interpret import *
#interp = Interpretation.from_learner(learn)
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_top_losses(3)