Hugging Face - Transformers
Links
- Transformers Doc - Examples
- Transformers GitHub
- Messageboard
- HF @ Medium: https://medium.com/huggingface
- Model sharing and uploading: https://huggingface.co/transformers/model_sharing.html
Model
transformers.AutoConfig
- doc - impl- Save a pretrained model: https://huggingface.co/transformers/main_classes/model.html?highlight=save_pretrained#transformers.PreTrainedModel.save_pretrained
Training
transformers.TrainingArguments
- doctransformers.Trainer
- doc - impl- Metrics (scroll down): https://huggingface.co/transformers/training.html#trainer
- Learning Rate Schedules: https://huggingface.co/transformers/main_classes/optimizer_schedules.html?highlight=warm%20restart#learning-rate-schedules-pytorch
- Freezing the encoder: https://huggingface.co/transformers/training.html?highlight=freezing#freezing-the-encoder
- Early Stopping
- Custom Loss Function to add class weights for unbalanced datasets:
Subclass
Trainer
and override thecompute_loss
method (see example here). - Multi Class (multi head) classification: https://discuss.huggingface.co/t/how-do-i-do-multi-class-multi-head-classification/1140
Tokenizer
tokenizers.BertWordPieceTokenizer
- for vocab generation - impltokenizers.normalizers.BertNormalizer
- impltransformers.tokenization_bert.BertTokenizer
- tokenizer for normal usage - impltransformers.tokenization_bert.BertTokenizerFast
- fast tokenizer for normal usage - impltokenizers.AutoTokenizer
- doc - implPreTrainedTokenizerBase.__call__
- doc
Pipelines
- GitHub: https://github.com/huggingface/transformers/blob/master/src/transformers/pipelines.py
TextClassificationPipeline
->Pipeline
->_ScikitCompat
Data Handling
- https://github.com/huggingface/nlp
- Usage Example: https://colab.research.google.com/drive/1-JIJlao4dI-Ilww_NnTc0rxtp-ymgDgM
- Fine-tuning with custom datasets: https://huggingface.co/transformers/master/custom_datasets.html#fine-tuning-with-custom-datasets
Important Torch Classes
Links & Know-how
- https://jalammar.github.io/illustrated-transformer/
- https://www.youtube.com/watch?v=PNNHaQUQqW8
- https://github.com/huggingface/naacl_transfer_learning_tutorial
- https://ruder.io/thesis/
Last modified July 16, 2022: fix headlines in ML doc (a533e7c)