**Kalman Optimization for Value Approximation**, or **KOVA** is a general framework for addressing uncertainties while approximating value-based functions in deep RL domains. KOVA minimizes a regularized objective function that concerns both parameter and noisy return uncertainties. It is feasible when using non-linear approximation functions as DNNs and can estimate the value in both on-policy and off-policy settings. It can be incorporated as a policy evaluation component in policy optimization algorithms.

**DeBERTa** is a [Transformer](https://paperswithcode.com/methods/category/transformers)-based neural language model that aims to improve the [BERT](https://paperswithcode.com/method/bert) and [RoBERTa](https://paperswithcode.com/method/roberta) models with two techniques: a [disentangled attention mechanism](https://paperswithcode.com/method/disentangled-attention-mechanism) and an enhanced mask decoder. The disentangled attention mechanism is where each word is represented unchanged using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangle matrices on their contents and relative positions. The enhanced mask decoder is used to replace the output [softmax](https://paperswithcode.com/method/softmax) layer to predict the masked tokens for model pre-training.  In addition, a new virtual adversarial training method is used for fine-tuning to improve model’s generalization on downstream tasks.

DeBERTa

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

KOVA

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking

A **Highway Network** is an architecture designed to ease gradient-based training of very deep networks. They allow unimpeded information flow across several layers on "information highways". The architecture is characterized by the use of gating units which learn to regulate the flow of information through a network. Highway networks with hundreds of layers can be trained directly using stochastic gradient descent and with a variety of activation functions.

Source	Kalman meets Bellman: Improving Policy Evaluation through Value Tracking
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com