Pre-trained model:
Adapter for distilbert-base-uncased in Houlsby architecture trained on the SST-2 dataset for 15 epochs with early stopping and a learning rate of 1e-4.
Adapter for distilbert-base-uncased in Pfeiffer architecture trained on the SST-2 dataset for 15 epochs with early stopping and a learning rate of 1e-4.
Adapter in Houlsby architecture trained on the binary SST task for 20 epochs with early stopping and a learning rate of 1e-4. See https://arxiv.org/pdf/2007.07779.pdf.
Pfeiffer Adapter trained on the SST-2 task.
Adapter (with head) trained using the `run_glue.py` script with an extension that retains the best checkpoint (out of 30 epochs).
Adapter in Pfeiffer architecture trained on the binary SST task for 20 epochs with early stopping and a learning rate of 1e-4. See https://arxiv.org/pdf/2007.07779.pdf.