A **Double Deep Q-Network**, or **Double DQN** utilises [Double Q-learning](https://paperswithcode.com/method/double-q-learning) to reduce overestimation by decomposing the max operation in the target into action selection and action evaluation. We evaluate the greedy policy according to the online network, but we use the target network to estimate its value.  The update is the same as for [DQN](https://paperswithcode.com/method/dqn), but replacing the target $Y^{DQN}\_{t}$ with:

$$ Y^{DoubleDQN}\_{t} = R\_{t+1}+\gamma{Q}\left(S\_{t+1}, \arg\max\_{a}Q\left(S\_{t+1}, a; \theta\_{t}\right);\theta\_{t}^{-}\right) $$

Compared to the original formulation of Double [Q-Learning](https://paperswithcode.com/method/q-learning), in Double DQN the weights of the second network $\theta^{'}\_{t}$ are replaced with the weights of the target network $\theta\_{t}^{-}$ for the evaluation of the current greedy policy.

The motivation of this work is based on the hypothesis that historical values of the acquisition function are good predictors of their future values. This idea is quite intuitive. For example, once a model is certain about its predictions on a given sample, this fact is unlikely to change. This can be explained by the randomness in the training, especially when using small acquisition sizes.

CEAL

Double DQN

Deep Reinforcement Learning with Double Q-learning

**CANINE** is a pre-trained encoder for language understanding that operates directly on character sequences—without explicit tokenization or vocabulary—and a pre-training strategy with soft inductive biases in place of hard token boundaries. To use its finer-grained input effectively and efficiently, Canine combines downsampling, which reduces the input sequence length, with a deep [transformer](https://paperswithcode.com/method/transformer) stack, which encodes context.

Source	Deep Reinforcement Learning with Double Q-learning
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com