**XLM** is a [Transformer](https://paperswithcode.com/method/transformer) based architecture that is pre-trained using one of three language modelling objectives:

1. Causal Language Modeling - models the probability of a word given the previous words in a sentence.
2. Masked Language Modeling - the masked language modeling objective of [BERT](https://paperswithcode.com/method/bert).
3. Translation Language Modeling - a (new) translation language modeling objective for improving cross-lingual pre-training.

The authors find that both the CLM and MLM approaches provide strong cross-lingual features that can be used for pretraining models.

**Singular Value Clipping (SVC)** is an adversarial training technique used by [TGAN](https://paperswithcode.com/method/tgan) to enforce the 1-Lipschitz constraint of the [WGAN](https://paperswithcode.com/method/wgan) objective. It is a constraint to all linear layers in the discriminator that satisfies the spectral norm of weight parameter $W$ is equal or less than one. This
means that the singular values of weight matrix are all one or less. Therefore singular value decomposition (SVD) is performed after a parameter update, replacing all the singular values larger than one with one, and the parameters are reconstructed with them. The same operation is applied to convolutional layers by interpreting a higher order tensor in weight parameter as a matrix $\hat{W}$.

Singular Value Clipping

Temporal Generative Adversarial Nets with Singular Value Clipping

Cross-lingual Language Model Pretraining

Train a convolutional neural network to generate the contents of an arbitrary image region conditioned on its surroundings.

Source	Cross-lingual Language Model Pretraining
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com