Viet-Anh on Software Logo

What is: Eligibility Trace?

Year2000
Data SourceCC BY-SA - https://paperswithcode.com

An Eligibility Trace is a memory vector z_tRd\textbf{z}\_{t} \in \mathbb{R}^{d} that parallels the long-term weight vector w_tRd\textbf{w}\_{t} \in \mathbb{R}^{d}. The idea is that when a component of w_t\textbf{w}\_{t} participates in producing an estimated value, the corresponding component of z_t\textbf{z}\_{t} is bumped up and then begins to fade away. Learning will then occur in that component of w_t\textbf{w}\_{t} if a nonzero TD error occurs before the trade falls back to zero. The trace-decay parameter λ[0,1]\lambda \in \left[0, 1\right] determines the rate at which the trace falls.

Intuitively, they tackle the credit assignment problem by capturing both a frequency heuristic - states that are visited more often deserve more credit - and a recency heuristic - states that are visited more recently deserve more credit.

E_0(s)=0E\_{0}\left(s\right) = 0 E_t(s)=γλE_t1(s)+1(S_t=s)E\_{t}\left(s\right) = \gamma\lambda{E}\_{t-1}\left(s\right) + \textbf{1}\left(S\_{t} = s\right)

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition