**Primer** is a [Transformer](https://paperswithcode.com/methods/category/transformers)-based architecture that improves upon the [Transformer](https://paperswithcode.com/method/transformer) architecture with two improvements found through [neural architecture search](https://paperswithcode.com/methods/category/neural-architecture-search): [squared RELU activations](https://paperswithcode.com/method/squared-relu) in the feedforward block, and [depthwise convolutions]() added to the attention multi-head projections: resulting in a new module called [Multi-DConv-Head-Attention](https://paperswithcode.com/method/multi-dconv-head-attention).

**MATE** is a [Transformer](https://paperswithcode.com/method/transformer) architecture designed to model the structure of web tables. It uses sparse attention in a way that allows heads to efficiently attend to either rows or columns in a table. Each attention head reorders the tokens by either column or row index and then applies a windowed attention mechanism. Unlike traditional self-attention, Mate scales linearly in the sequence length.

MATE

MATE: Multi-view Attention for Table Transformer Efficiency

Primer

Primer: Searching for Efficient Transformers for Language Modeling

**ComiRec** is a multi-interest framework for sequential recommendation. The multi-interest module captures multiple interests from user behavior sequences, which can be exploited for retrieving candidate items from the large-scale item pool. These items are then fed into an aggregation module to obtain the overall recommendation. The aggregation module leverages a controllable factor to balance the recommendation accuracy and diversity.

Source	Primer: Searching for Efficient Transformers for Language Modeling
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com