Adapters are Lightweight 🤖

"Adapter" refers to a set of newly introduced weights, typically within the layers of a transformer model. Adapters provide an alternative to fully fine-tuning the model for each downstream task, while maintaining performance. They also have the added benefit of requiring as little as 1MB of storage space per task!

Modular, Composable, and Extensible 🔧

Adapters, being self-contained moduar units, allow for easy extension and composition. This opens up opportunities to compose adapters to solve new tasks.

Built on HuggingFace 🤗 Transformers 🚀

AdapterHub builds on the HuggingFace transformers framework requiring as little as two additional lines of code to train adapters for a downstream task.

Quickstart 🔥

Load an Adapter for Inference 🏄

Loading existing adapters from our repository is as simple as adding one additional line of code:

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

The SST adapter is light-weight: it is only 3MB! At the same time, it achieves results that are on-par with fully fine-tuned BERT. We can now leverage SST adapter to predict the sentiment of sentences:

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
tokens = tokenizer.tokenize("AdapterHub is awesome!")
input_tensor = torch.tensor([
outputs = model(

Train an Adapter 🏋️️

Training a new task adapter requires only few modifications compared to fully fine-tuning a model with Hugging Face's Trainer. We first load a pre-trained model, e.g., roberta-base and add a new task adapter:

model = AutoModelWithHeads.from_pretrained('roberta-base')
model.add_adapter("sst-2", AdapterType.text_task)

By calling train_adapter(["sst-2"]) we freeze all transformer parameters except for the parameters of sst-2 adapter. Before training we add a new classification head to our model:

model.add_classification_head("sst-2", num_labels=2)

The weights of this classification head can be stored together with the adapter weights to allow for a full reproducibility. The method call model.set_active_adapters([["sst-2"]]) registers the sst-2 adapter as a default for training. This also supports adapter stacking and adapter fusion!

We can then train our adapter using the Hugging Face Trainer:

Tip 1️: Adapter weights are usually initialized randomly. That is why we require a higher learning rate. We have found that a default adapter learning rate of lr=0.0001 works well for most settings.
Tip 2️: Depending on your data set size you might also need to train longer than usual. To avoid overfitting you can evaluating the adapters after each epoch on the development set and only save the best model.

That's it! model.save_all_adapters('output-path') exports all adapters. Consider sharing them on AdapterHub!

Citation 📝

    title={AdapterHub: A Framework for Adapting Transformers},
    author={Jonas Pfeiffer and
            Andreas R\"uckl\'{e} and
            Clifton Poth and
            Aishwarya Kamath and
            Ivan Vuli\'{c} and
            Sebastian Ruder and
            Kyunghyun Cho and
            Iryna Gurevych},
    booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020): Systems Demonstrations},
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "",
    pages = "46--54",