**TopK Copy** is a cross-attention guided copy mechanism for entity extraction where only the Top-$k$ important attention heads are used for computing copy distributions. The motivation is that that attention heads may not equally important, and that some heads can be pruned out with a marginal decrease in overall performance. Attention probabilities produced by insignificant attention heads may be noisy. Thus, computing copy distributions without these heads could improve the model’s ability to infer the importance of each token in the input document.

**Fast Schema Guided Tracker**, or **FastSGT**, is a fast and robust [BERT](https://paperswithcode.com/method/bert)-based model for state tracking in goal-oriented dialogue systems. The model employs carry-over mechanisms for transferring the values between slots, enabling switching between services and accepting the values offered by the system during dialogue. It also uses [multi-head attention](https://paperswithcode.com/method/multi-head-attention) projections in some of the decoders to have a better modelling of the encoder outputs.

The model architecture is illustrated in the Figure. It consists of four main modules: 1-Utterance Encoder, 2-Schema Encoder, 3-State Decoder, and 4-State Tracker. The first three modules constitute the NLU component and are based on neural networks, whereas the state tracker is a rule-based module. [BERT](https://paperswithcode.com/method/bert) was used for both encoders in the model.

The Utterance Encoder is a BERT model which encodes the user and system utterances at each turn. The Schema Encoder is also a BERT model which encodes the schema descriptions of intents, slots, and values into schema embeddings. These schema embeddings help the decoders to transfer or share knowledge between different services by having some language understanding of each slot, intent, or value. The schema and utterance embeddings are passed to the State Decoder - a multi-task module. This module consists of five sub-modules producing the information necessary to track the state of the dialogue. Finally, the State Tracker module takes the previous state along with the current outputs of the State Decoder and predicts the current state of the dialogue by aggregating and summarizing the information across turns.

FastSGT

A Fast and Robust BERT-based Dialogue State Tracker for Schema-Guided Dialogue Dataset

TopK Copy

Document-level Entity-based Extraction as Template Generation

**Slow Momentum** (SlowMo) is a distributed optimization method where workers periodically synchronize and perform a momentum update, after multiple iterations of a base optimization algorithm.  Periodically, after taking some number $\tau$ of base algorithm steps, workers average their parameters using ALLREDUCE and perform a momentum update.

Source	Document-level Entity-based Extraction as Template Generation
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com