- LoRA Low-Rank Adaptation of Large Language Models
- Prefix-Tuning Optimizing Continuous Prompts for Generation
Nice summary from LoRA Low-Rank Adaptation of Large Language Models of the adapter techniques1 that were available at the time:
Adapter tuning as proposed in Houlsby et al. (2019) inserts adapter layers between the selfattention module (and the MLP module) and the subsequent residual connection. There are two fully connected layers with biases in an adapter layer with a nonlinearity in between. We call this original design AdapterH . Recently, Lin et al. (2020) proposed a more efficient design with the adapter layer applied only after the MLP module and after a LayerNorm. We call it AdapterL . This is very similar to another deign proposed in Pfeiffer et al. (2021), which we call AdapterP . We also include another baseline call AdapterDrop (Ruckl š e et al., 2020) which drops some adapter layers for Ž greater efficiency (AdapterD ).
Footnotes
-
should probably add some of these to the list above! â©