Sarsa_INLINE_MATH_1 extends eligibility-traces to action-value methods. It has the same update rule as for TD_INLINE_MATH_1 but we use the action-value form of the TD erorr:
δ_t=R_t+1+γq^(S_t+1,A_t+1,w_t)−q^(S_t,A_t,w_t)
and the action-value form of the eligibility trace:
z_−1=0
z_t=γλz_t−1+∇q^(S_t,A_t,w_t),0≤t≤T
Source: Sutton and Barto, Reinforcement Learning, 2nd Edition