**Contrastive BERT** is a reinforcement learning agent that combines a new contrastive loss and a hybrid [LSTM](https://paperswithcode.com/method/lstm)-[transformer](https://paperswithcode.com/method/transformer) architecture to tackle the challenge of improving data efficiency for RL. It uses bidirectional masked prediction in combination with a generalization of recent contrastive methods to learn better representations for transformers in RL, without the need of hand engineered data augmentations.

For the architecture, a residual network is used to encode observations into embeddings $Y\_{t}$. $Y_{t}$  is fed through a causally masked [GTrXL transformer](https://www.paperswithcode.com/method/gtrxl), which computes the predicted masked inputs $X\_{t}$ and passes those together with $Y\_{t}$ to a learnt gate. The output of the gate is passed through a single [LSTM](https://www.paperswithcode.com/method/lstm) layer to produce the values that we use for computing the RL loss. A contrastive loss is computed using predicted masked inputs $X_{t}$ and $Y_{t}$ as targets. For this, we do not use the causal mask of the Transformer.

**GPipe** is a distributed model parallel method for neural networks. With GPipe, each model can be specified as a sequence of layers, and consecutive groups of layers can be partitioned into cells. Each cell is then placed on a separate accelerator. Based on this partitioned setup, batch splitting is applied. A mini-batch of training examples is split into smaller micro-batches, then the execution of each set of micro-batches is pipelined over cells. Synchronous mini-batch gradient descent is applied for training, where gradients are accumulated across all micro-batches in a mini-batch and applied at the end of a mini-batch.

GPipe

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

CoBERL

CoBERL: Contrastive BERT for Reinforcement Learning

**Discriminative Regularization** is a regularization technique for [variational autoencoders](https://paperswithcode.com/methods/category/likelihood-based-generative-models) that uses representations from discriminative classifiers to augment the [VAE](https://paperswithcode.com/method/vae) objective function (the lower bound) corresponding to a generative model. Specifically, it encourages the model’s reconstructions to be close to the data example in a representation space defined by the hidden layers of highly-discriminative, neural network based classifiers.

Source	CoBERL: Contrastive BERT for Reinforcement Learning
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com