AICurious Logo

What is: BigBird?

SourceBig Bird: Transformers for Longer Sequences
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

BigBird is a Transformer with a sparse attention mechanism that reduces the quadratic dependency of self-attention to linear in the number of tokens. BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. In particular, BigBird consists of three main parts:

  • A set of gg global tokens attending on all parts of the sequence.
  • All tokens attending to a set of ww local neighboring tokens.
  • All tokens attending to a set of rr random tokens.

This leads to a high performing attention mechanism scaling to much longer sequence lengths (8x).