What is: Clipped Double Q-learning?
Source | Addressing Function Approximation Error in Actor-Critic Methods |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Clipped Double Q-learning is a variant on Double Q-learning that upper-bounds the less biased Q estimate by the biased estimate . This is equivalent to taking the minimum of the two estimates, resulting in the following target update:
The motivation for this extension is that vanilla double Q-learning is sometimes ineffective if the target and current networks are too similar, e.g. with a slow-changing policy in an actor-critic framework.