Viet-Anh on Software Logo

What is: Sarsa Lambda?

Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Sarsa_INLINE_MATH_1 extends eligibility-traces to action-value methods. It has the same update rule as for TD_INLINE_MATH_1 but we use the action-value form of the TD erorr:

δ_t=R_t+1+γq^(S_t+1,A_t+1,w_t)q^(S_t,A_t,w_t)\delta\_{t} = R\_{t+1} + \gamma\hat{q}\left(S\_{t+1}, A\_{t+1}, \mathbb{w}\_{t}\right) - \hat{q}\left(S\_{t}, A\_{t}, \mathbb{w}\_{t}\right)

and the action-value form of the eligibility trace:

z_1=0\mathbb{z}\_{-1} = \mathbb{0}

z_t=γλz_t1+q^(S_t,A_t,w_t),0tT \mathbb{z}\_{t} = \gamma\lambda\mathbb{z}\_{t-1} + \nabla\hat{q}\left(S\_{t}, A\_{t}, \mathbb{w}\_{t} \right), 0 \leq t \leq T

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition