What is: Generalized State-Dependent Exploration?
Source | Smooth Exploration for Robotic Reinforcement Learning |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Generalized State-Dependent Exploration, or gSDE, is an exploration method for reinforcement learning that uses more general features and re-sampling the noise periodically.
State-Dependent Exploration (SDE) is an intermediate solution for exploration that consists in adding noise as a function of the state , to the deterministic action . At the beginning of an episode, the parameters of that exploration function are drawn from a Gaussian distribution. The resulting action is as follows:
This episode-based exploration is smoother and more consistent than the unstructured step-based exploration. Thus, during one episode, instead of oscillating around a mean value, the action a for a given state will be the same.
In the case of a linear exploration function , by operation on Gaussian distributions, Rückstieß et al. show that the action element is normally distributed:
where is a diagonal matrix with elements .
Because we know the policy distribution, we can obtain the derivative of the log-likelihood with respect to the variance :
This can be easily plugged into the likelihood ratio gradient estimator, which allows to adapt during training. SDE is therefore compatible with standard policy gradient methods, while addressing most shortcomings of the unstructured exploration.
For gSDE, two improvements are suggested:
- We sample the parameters of the exploration function every steps instead of every episode.
- Instead of the state s, we can in fact use any features. We chose policy features (last layer before the deterministic output as input to the noise function