Viet-Anh on Software Logo

What is: Clipped Double Q-learning?

SourceAddressing Function Approximation Error in Actor-Critic Methods
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Clipped Double Q-learning is a variant on Double Q-learning that upper-bounds the less biased Q estimate Q_θ_2Q\_{\theta\_{2}} by the biased estimate Q_θ_1Q\_{\theta\_{1}}. This is equivalent to taking the minimum of the two estimates, resulting in the following target update:

y_1=r+γmin_i=1,2Q_θ_i(s,π_ϕ_1(s))y\_{1} = r + \gamma\min\_{i=1,2}Q\_{\theta'\_{i}}\left(s', \pi\_{\phi\_{1}}\left(s'\right)\right)

The motivation for this extension is that vanilla double Q-learning is sometimes ineffective if the target and current networks are too similar, e.g. with a slow-changing policy in an actor-critic framework.