**STraTA**, or **Self-Training with Task Augmentation**, is a self-training approach that builds on two key ideas for effective leverage of unlabeled data. First, STraTA uses task augmentation, a technique that synthesizes a large amount of data for auxiliary-task fine-tuning from target-task unlabeling texts. Second, STRATA performs self-training by further fine-tuning the strong base model created by task augmentation on a broad distribution of pseudo-labeled data.

In task augmentation, we train an NLI data generation model and use it to synthesize a large amount of in-domain NLI training data for each given target task, which is then used for auxiliary (intermediate) fine-tuning. The self-training algorithm iteratively learns a better model using a concatenation of labeled and pseudo-labeled examples. At each iteration, we always start with the auxiliary-task model produced by task augmentation and train on a broad distribution of pseudo-labeled data.

**Adaptive Input Embeddings** extend the [adaptive softmax](https://paperswithcode.com/method/adaptive-softmax) to input word representations. The factorization assigns more capacity to frequent words and reduces the capacity for less frequent words with the benefit of reducing overfitting to rare words.

Adaptive Input Representations

Adaptive Input Representations for Neural Language Modeling

STraTA

STraTA: Self-Training with Task Augmentation for Better Few-shot Learning

**FairMOT** is a model for multi-object tracking which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features. The achieved fairness between the tasks is used to achieve high levels of detection and tracking accuracy. The detection branch is implemented in an anchor-free style which estimates object centers and sizes represented as position-aware measurement maps. Similarly, the re-ID branch estimates a re-ID feature for each pixel to characterize the object centered at the pixel. Note that the two branches are completely homogeneous which essentially differs from the previous methods which perform detection and re-ID in a cascaded style. It is also worth noting that FairMOT operates on high-resolution feature maps of strides four while the previous anchor-based methods operate on feature maps of stride 32. The elimination of anchors as well as the use of high-resolution feature maps better aligns re-ID features to object centers which significantly improves the tracking accuracy.

Source	STraTA: Self-Training with Task Augmentation for Better Few-shot Learning
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com