Viet-Anh on Software Logo

What is: Double DQN?

SourceDeep Reinforcement Learning with Double Q-learning
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

A Double Deep Q-Network, or Double DQN utilises Double Q-learning to reduce overestimation by decomposing the max operation in the target into action selection and action evaluation. We evaluate the greedy policy according to the online network, but we use the target network to estimate its value. The update is the same as for DQN, but replacing the target YDQN_tY^{DQN}\_{t} with:

YDoubleDQN_t=R_t+1+γQ(S_t+1,argmax_aQ(S_t+1,a;θ_t);θ_t)Y^{DoubleDQN}\_{t} = R\_{t+1}+\gamma{Q}\left(S\_{t+1}, \arg\max\_{a}Q\left(S\_{t+1}, a; \theta\_{t}\right);\theta\_{t}^{-}\right)

Compared to the original formulation of Double Q-Learning, in Double DQN the weights of the second network θ_t\theta^{'}\_{t} are replaced with the weights of the target network θ_t\theta\_{t}^{-} for the evaluation of the current greedy policy.