**mBART** is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the [BART objective](https://paperswithcode.com/method/bart). The input texts are noised by masking phrases and permuting sentences, and a single [Transformer model](https://paperswithcode.com/method/transformer) is learned to recover the texts. Different from other pre-training approaches for machine translation, mBART pre-trains a complete autoregressive [Seq2Seq](https://paperswithcode.com/method/seq2seq) model. mBART is trained once for all languages, providing a set of parameters that can be fine-tuned for any of the language pairs in both supervised and unsupervised settings, without any task-specific or language-specific modifications or initialization schemes.

**PSPNet**, or **Pyramid Scene Parsing Network**, is a semantic segmentation model that utilises a pyramid parsing module that exploits global context information by different-region based context aggregation. The local and global clues together make the final prediction more reliable. We also propose an optimization

Given an input image, PSPNet use a pretrained CNN with the dilated network strategy to extract the feature map. The final feature map size is $1/8$ of the input image. On top of the map, we use the [pyramid pooling module](https://paperswithcode.com/method/pyramid-pooling-module) to gather context information. Using our 4-level pyramid, the pooling kernels cover the whole, half of, and small portions of the image. They are fused as the global prior.
Then we concatenate the prior with the original feature map in the final part of. It is followed by a [convolution](https://paperswithcode.com/method/convolution) layer to generate the final prediction map.

PSPNet

Pyramid Scene Parsing Network

mBART

Multilingual Denoising Pre-training for Neural Machine Translation

Lightweight or mobile neural networks used for real-time computer vision tasks contain fewer parameters than normal
networks, which lead to a constrained performance. In this work, we proposed a novel activation function named Tanh Exponential
Activation Function (TanhExp) which can improve the performance for these networks on image classification task significantly.
The definition of TanhExp is $f(x) = x tanh(e^x)$. We demonstrate the simplicity, efficiency, and robustness of TanhExp on various
datasets and network models and TanhExp outperforms its counterparts in both convergence speed and accuracy. Its behaviour
also remains stable even with noise added and dataset altered. We show that without increasing the size of the network, the
capacity of lightweight neural networks can be enhanced by TanhExp with only a few training epochs and no extra parameters
added.

Source	Multilingual Denoising Pre-training for Neural Machine Translation
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com