AICurious Logo

What is: PAR Transformer?

SourcePay Attention when Required
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

PAR Transformer is a Transformer model that uses 63% fewer self-attention blocks, replacing them with feed-forward blocks, while retaining test accuracies. It is based on the Transformer-XL architecture and uses neural architecture search to find an an efficient pattern of blocks in the transformer architecture.