Overview

An alternative to both model-free and model-based representations of a reinforcement learning environment is successor representation, which encodes an “occupancy count” for each state based on whether they predict visiting other states when following some policy.

Successor representation

The successor representation for a policy is defined for all , where is a current state and is a future state, by

where is the indicator function that returns when the current state is and otherwise.

  • Indicator fn like reward fn?

Recursive equations for SR

Equation: Recursive equations for SR

For any policy , the successor representation for a current state and future state can be computed by

If is a deterministic policy, we have


Computing state value from the SR

Expressing the value of a state using SR and state reward

Let be a policy for acting in a Markov decision process with reward function , and let

be the state reward function associated with . Then the state value function can be computed as a linear combination of the successor representation and state reward function:

Proof adapted from @2024griffithsBayesian, p. 214. We have

where the fourth line expresses as a sum of expected immediate reward and expected discounted future value

and the final equality gives the familiar Bellman equation for state value.