What is: Q-Learning?
Year | 1984 |
Data Source | CC BY-SA - https://paperswithcode.com |
Q-Learning is an off-policy temporal difference control algorithm:
The learned action-value function directly approximates , the optimal action-value function, independent of the policy being followed.
Source: Sutton and Barto, Reinforcement Learning, 2nd Edition