AICurious Logo

What is: BP-Transformer?

SourceBP-Transformer: Modelling Long-Range Context via Binary Partitioning
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

The BP-Transformer (BPT) is a type of Transformer that is motivated by the need to find a better balance between capability and computational complexity for self-attention. The architecture partitions the input sequence into different multi-scale spans via binary partitioning (BP). It incorporates an inductive bias of attending the context information from fine-grain to coarse-grain as the relative distance increases. The farther the context information is, the coarser its representation is. BPT can be regard as graph neural network, whose nodes are the multi-scale spans. A token node can attend the smaller-scale span for the closer context and the larger-scale span for the longer distance context. The representations of nodes are updated with Graph Self-Attention.