Viet-Anh on Software Logo

What is: Sarsa?

Year1994
Data SourceCC BY-SA - https://paperswithcode.com

Sarsa is an on-policy TD control algorithm:

Q(S_t,A_t)Q(S_t,A_t)+α[Rt+1+γQ(S_t+1,A_t+1)Q(S_t,A_t)]Q\left(S\_{t}, A\_{t}\right) \leftarrow Q\left(S\_{t}, A\_{t}\right) + \alpha\left[R_{t+1} + \gamma{Q}\left(S\_{t+1}, A\_{t+1}\right) - Q\left(S\_{t}, A\_{t}\right)\right]

This update is done after every transition from a nonterminal state S_tS\_{t}. if S_t+1S\_{t+1} is terminal, then Q(S_t+1,A_t+1)Q\left(S\_{t+1}, A\_{t+1}\right) is defined as zero.

To design an on-policy control algorithm using Sarsa, we estimate q_πq\_{\pi} for a behaviour policy π\pi and then change π\pi towards greediness with respect to q_πq\_{\pi}.

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition