What is: Expected Sarsa?
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Expected Sarsa is like Q-learning but instead of taking the maximum over next state-action pairs, we use the expected value, taking into account how likely each action is under the current policy.
Except for this change to the update rule, the algorithm otherwise follows the scheme of Q-learning. It is more computationally expensive than Sarsa but it eliminates the variance due to the random selection of .
Source: Sutton and Barto, Reinforcement Learning, 2nd Edition