Overview
Traditional decision theory aims to assess the behavioral interactions between an agent and its environment by operationalizing the agent’s external behavior as a strategy and assumptions about what is “good” or “bad” for the agent as utility.
Formally, a sequential decision problem is a tuple , where is a set of states, is a set of actions, is a transition function mapping history-action pairs to distributions over states, and is a utility function associating a reward to each action after following a certain history.
Related notes:
- Markov decision processes: a special case of sequential decision problems that satisfy the Markov property, meaning that any two histories induce the same mapping , and the rewards are precisely equal whenever the two histories agree on their final state.
- (Resource-)rational analysis: a framework distinguished from traditional decision theory by accounting for internal structures that impose constraints on optimal behavior.
Preliminaries: Conceptual primitives of Bayesian decision theory
States, observations, and probabilities
- States: a set consisting of ways the world may be at any given time; they include everything that might “matter” to an agent.
- Probability of a state: the value of function assigning a probability to each state ; the fact that this is a probability means simply that the axiom is satisfied. The set of all probability distributions on a set is denoted .
- Observations: a set of (potentially imperfect) inputs taken by the agent to reflect each state.
- concept-question In a bijection with the set of states?
- Likelihood of an observation: for each state , a function for which specifies the probability of observing in the current .
- Posterior probability of a state: after observing , the probability that is what generated is given by Bayes’ rule or, equivalently, the proportion
- concept-question ?
- For a sequence of (independently generated) observations , the posterior is proportional to the product
Actions
- Actions: a set minimally understood as an event that is directly under an agent’s control.
- Stochastic or non-deterministic behavior can be represented by the set of probability distributions on , where each element gives the probability with which an agent takes a given action.
Utility
- Utility function: a map that associates each state-action pair with a real-valued quantity that measures, roughly, how “good” or “desirable” it is to take action in situation .
- Expected utility of an action: when state is uncertain, the expected utility of taking is the average of when is drawn with probability : Alternatively, if the actions have uncertain outcomes and utility depends only on the state, we use the expected utility
Sequential decision problems and strategies
History
Let be sets of possible states and actions, respectively. A history is a sequence of length produced by an interaction between an agent an its environment at time , where each action produces a new state . The set of all histories of length is denoted
Sequential decision problem
A sequential decision problem is a tuple , where:
- is a set of states;
- is a set of actions;
- is a transition function capturing environmental dynamics, with giving the probability of reaching in one step following history when the agent performs action ;
- is the utility function, with giving the utility or reward obtained from taking action after following history .
Strategy
A strategy, also known as a policy, is a function which specifies a distribution on actions for every possible history . We write for the probability of action at history . A stochastic policy is also known as a mixed strategy.
Markov decision processes and stationary strategies
Primary note: Markov decision processes
Markov decision process
A Markov decision process is a sequential decision problem which satisfies the following:
- (i) For all , we have ;
- (ii) Whenever agree on their last state, the rewards also agree.
Thus, when considering MDPs we write and instead of and , respectively.
Stationary strategy
A strategy or policy is said to be stationary if it depends only on the current state, i.e., is a map .
Selected quotations
- Lewis (1974, quoted in @2025icardResource) on expected utility theory as “a yardstick for rational choice”: Expected utility theory “is the very core of our common-sense theory of persons, dissected out and elegantly systematized”. (26)
- Meek and Glymour (1994, quoted in @2025icardResource); on the tension between predictability of decisions and overriding external causal influences: “One may view decisions, one’s own or another’s, as the result of a dual system with a default part and an extraordinary part—the default part subject to causes that may also influence the outcome through another mechanism, but the extraordinary part not so influenced and having the power to intervene and displace or modify the productions of the default part.” (28)
References
- @2025icardResource, Resource Rationality
Code snippets
\langle \mathcal S, \mathcal A, q, u \rangle