Overview

Traditional decision theory aims to assess the behavioral interactions between an agent and its environment by operationalizing the agent’s external behavior as a strategy and assumptions about what is “good” or “bad” for the agent as utility.

Formally, a sequential decision problem is a tuple $⟨ S, A, q, u ⟩$ , where $S$ is a set of states, $A$ is a set of actions, $q : H \times A \to Δ (S)$ is a transition function mapping history-action pairs to distributions over states, and $u : H \times A \to R$ is a utility function associating a reward to each action after following a certain history.

Related notes:

Markov decision processes: a special case of sequential decision problems that satisfy the Markov property, meaning that any two histories $h, h ’ \in H$ induce the same mapping $q_{h, a} = q_{h ’, a}$ , and the rewards $u (h, a) = u (h ’, a)$ are precisely equal whenever the two histories agree on their final state.
(Resource-)rational analysis: a framework distinguished from traditional decision theory by accounting for internal structures that impose constraints on optimal behavior.

Preliminaries: Conceptual primitives of Bayesian decision theory

States, observations, and probabilities

States: a set $S = {s_{1}, s_{2}, \dots}$ consisting of ways the world may be at any given time; they include everything that might “matter” to an agent.
Probability of a state: the value of function $p : S \to [0, 1]$ assigning a probability $p (s)$ to each state $s \in S$ ; the fact that this is a probability means simply that the axiom $\sum_{s} p (s) = 1$ is satisfied. The set of all probability distributions on a set $S$ is denoted $Δ (S)$ .
Observations: a set $O = {o_{1}, o_{2}, \dots}$ of (potentially imperfect) inputs taken by the agent to reflect each state.
- concept-question In a bijection with the set of states?
Likelihood of an observation: for each state $s \in S$ , a function $ℓ (\cdot ∣ s) \in Δ (O)$ for which $ℓ (o ∣ s)$ specifies the probability of observing $o \in O$ in the current $s$ .
Posterior probability of a state: after observing $o$ , the probability that $s$ is what generated $o$ is given by Bayes’ rule $p (s ∣ o) = \frac{ℓ ( o ∣ s ) p ( s )}{\sum _{s^{'}} ℓ ( o ∣ s ^{'} ) p ( s ^{'} )}$ or, equivalently, the proportion $p (s ∣ o) \propto ℓ (o ∣ s) p (s) .$
- concept-question $p (\cdot ∣ o) \in Δ (S)$ ?
- For a sequence of (independently generated) observations $o = o_{1}, \dots, o_{n}$ , the posterior is proportional to the product $p (s ∣ o) \propto \prod_{i \leq n} ℓ (o_{i} ∣ s) p (s) .$

Actions

Actions: a set $A$ minimally understood as an event that is directly under an agent’s control.
- Stochastic or non-deterministic behavior can be represented by the set of probability distributions $Δ (A)$ on $A$ , where each element gives the probability with which an agent takes a given action.

Utility

Utility function: a map $u : S \times A \to R$ that associates each state-action pair with a real-valued quantity $u (s, a)$ that measures, roughly, how “good” or “desirable” it is to take action $a$ in situation $s$ .
Expected utility of an action: when state $s$ is uncertain, the expected utility of taking $a \in A$ is the average of $u (s, a)$ when $s$ is drawn with probability $p (s)$ : $E U (a) = E_{s \sim p} u (s, a) = \sum_{s \in S} p (s) u (s, a) .$ Alternatively, if the actions have uncertain outcomes and utility $u : S \to R$ depends only on the state, we use the expected utility $E U (a) = \sum_{s \in S} p (s ∣ a) u (s) .$

Sequential decision problems and strategies

History

Let $S, A$ be sets of possible states and actions, respectively. A history $h = s_{0}, a_{0}, \dots, s_{t - 1}, a_{t - 1}, s_{t}$ is a sequence of length $2 t + 1$ produced by an interaction between an agent an its environment at time $t$ , where each action $a_{i}$ produces a new state $s_{i + 1}$ . The set of all histories of length $2 t + 1$ is denoted $H = ⋃_{t} H_{t} .$

Sequential decision problem

A sequential decision problem is a tuple $⟨ S, A, q, u ⟩$ , where:

$S$ is a set of states;

$A$ is a set of actions;

$q : H \times A \to Δ (s)$ is a transition function capturing environmental dynamics, with $q_{h, a} (s)$ giving the probability of reaching $s$ in one step following history $h$ when the agent performs action $a$ ;

$u : H \times A \to R$ is the utility function, with $u (h, a)$ giving the utility or reward obtained from taking action $a$ after following history $h$ .

Strategy

A strategy, also known as a policy, is a function $σ : H \to Δ (A)$ which specifies a distribution on actions for every possible history $h \in H$ . We write $σ_{h} (a)$ for the probability of action $a$ at history $h$ . A stochastic policy is also known as a mixed strategy.

Markov decision processes and stationary strategies

Primary note: Markov decision processes

Markov decision process

A Markov decision process is a sequential decision problem $⟨ S, A, q, u ⟩$ which satisfies the following:

(i) For all $h, h ’ \in H$ , we have $q_{h, a} = q_{h ’, a}$ ;

(ii) Whenever $h, h ’$ agree on their last state, the rewards $u (h, a) = u (h ’, a)$ also agree.

Thus, when considering MDPs we write $q_{s, a}$ and $u (s, a)$ instead of $q_{h, a}$ and $u (h, a)$ , respectively.

Stationary strategy

A strategy or policy $σ : H \to Δ (A)$ is said to be stationary if it depends only on the current state, i.e., $σ$ is a map $S \to Δ (A)$ .

Selected quotations

Lewis (1974, quoted in @2025icardResource) on expected utility theory as “a yardstick for rational choice”: Expected utility theory “is the very core of our common-sense theory of persons, dissected out and elegantly systematized”. (26)
Meek and Glymour (1994, quoted in @2025icardResource); on the tension between predictability of decisions and overriding external causal influences: “One may view decisions, one’s own or another’s, as the result of a dual system with a default part and an extraordinary part—the default part subject to causes that may also influence the outcome through another mechanism, but the extraordinary part not so influenced and having the power to intervene and displace or modify the productions of the default part.” (28)

References

@2025icardResource, Resource Rationality

Code snippets

\langle \mathcal S, \mathcal A, q, u \rangle

BONNIE'S NOTES

Table of Contents

(Sequential) decision problems

Overview

Preliminaries: Conceptual primitives of Bayesian decision theory

States, observations, and probabilities

Actions

Utility

Sequential decision problems and strategies

Markov decision processes and stationary strategies

Selected quotations

References

Code snippets

Graph View

Backlinks