🪴 Anil's Garden

❯

❯

Reformer: The Efficient Transformer

Reformer: The Efficient Transformer

18 Jul 20251 min read

paper

Title: Reformer: The Efficient Transformer
Authors: Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya
Published: 13th January 2020 (Monday) @ 18:38:28
Link: http://arxiv.org/abs/2001.04451v2

Abstract

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O( $L^{2}$ ) to O( $L lo g L$ ), where $L$ is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, where $N$ is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.

Graph View

Backlinks

No backlinks found

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋