The Markov property

In a reinforcement learning framework, the Markov property is required for agents to make decisions as a function of the environment’s state signal (i.e., information available to the agent)—it ensures that the state is a sufficient statistic for predicting future states and rewards.

Markov property

A state signal at time $t$ has the Markov property, and is a Markov state, if and only if the environment’s response at time $t + 1$ can be specified by the distribution
$p (s^{'}, r ∣ s, a) = Pr {R_{t + 1} = r, S_{t + 1} = s^{'} ∣ S_{t}, A_{t}} = Pr {R_{t + 1} = r, S_{t + 1} = s^{'} ∣ S_{0}, A_{0}, R_{1}, \dots, S_{t - 1}, A_{t - 1}, R_{t}, S_{t}, A_{t}}$
for all $r, s ’$ and all possible values of past events $S_{0}, A_{0}, R_{1}, \dots, S_{t - 1}, A_{t - 1}, R_{t}, S_{t}, A_{t}$ .

BONNIE'S NOTES

The Markov property

Graph View

Backlinks