Overview
An alternative to both model-free and model-based representations of a reinforcement learning environment is successor representation, which encodes an “occupancy count” for each state based on whether they predict visiting other states when following some policy.
Successor representation
The successor representation for a policy is defined for all , where is a current state and is a future state, by
where is the indicator function that returns when the current state is and otherwise.
- Indicator fn like reward fn?
Recursive equations for SR
Equation: Recursive equations for SR
For any policy , the successor representation for a current state and future state can be computed by
If is a deterministic policy, we have
Computing state value from the SR
Expressing the value of a state using SR and state reward
Let be a policy for acting in a Markov decision process with reward function , and let
be the state reward function associated with . Then the state value function can be computed as a linear combination of the successor representation and state reward function:
Proof adapted from @2024griffithsBayesian, p. 214. We have
where the fourth line expresses as a sum of expected immediate reward and expected discounted future value
and the final equality gives the familiar Bellman equation for state value.