Poster
ELoRA: Efficient Low-Rank Adaptation with Random Matrices
Dawid Kopiczko · Tijmen Blankevoort · Yuki Asano
Halle B
It is becoming common practice for natural language processing to finetune pretrained language models for several downstream tasks at the same time. In practice, one might see several use cases based on the same model running simultaneously. Yet, this practice comes with considerable storage requirements, an issue that becomes particularly acute when scaling to large models or deploying numerous per-user or per-task adapted models. Although parameter-efficient finetuning methods such as LoRA exist, they do not fully mitigate this storage challenge. To this end, we introduce Efficient Low-Rank Adaptation with Random Matrices (ELoRA), which takes parameter efficiency to the extreme. By freezing a single pair of random low-rank matrices, shared across all layers, and using small layer-wise trainable scaling vectors, ELoRA achieves a 10x reduction in trainable parameters compared to LoRA without compromising performance levels. We demonstrate the effectiveness of the method on the GLUE benchmark and analyze its parameter-performance trade-off. Finally, using the Llama2 7B model, we show that ELoRA can also be used for instruction-tuning with merely 1.4M parameters.