Skip to yearly menu bar Skip to main content


Poster

Transformer-VQ: Linear-Time Transformers via Vector Quantization

Lucas D. Lingle

Halle B
[ ]
Thu 9 May 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

We introduce Transformer-VQ, a decoder-only transformer computing softmax-based dense self-attention in linear time. Transformer-VQ's efficient attention is enabled by vector-quantized keys and a novel caching mechanism. In large-scale experiments, Transformer-VQ is shown highly competitive in quality, with strong results on Enwik8 (0.99 bpb), PG-19 (26.6 ppl), and ImageNet64 (3.16 bpb). Code: https://github.com/transformer-vq/transformer_vq

Chat is not available.