Viet-Anh on Software Logo

What is: N-step Returns?

Year2000
Data SourceCC BY-SA - https://paperswithcode.com

nn-step Returns are used for value function estimation in reinforcement learning. Specifically, for nn steps we can write the complete return as:

R_t(n)=r_t+1+γr_t+2++γn1_t+n+γnV_t(s_t+n)R\_{t}^{(n)} = r\_{t+1} + \gamma{r}\_{t+2} + \cdots + \gamma^{n-1}\_{t+n} + \gamma^{n}V\_{t}\left(s\_{t+n}\right)

We can then write an nn-step backup, in the style of TD learning, as:

ΔV_r(s_t)=α[R_t(n)V_t(s_t)]\Delta{V}\_{r}\left(s\_{t}\right) = \alpha\left[R\_{t}^{(n)} - V\_{t}\left(s\_{t}\right)\right]

Multi-step returns often lead to faster learning with suitably tuned nn.

Image Credit: Sutton and Barto, Reinforcement Learning