Overview

@2022hoPeople introduce value-guided task construals to model the process of adaptively selecting simplified representations of cause-effect relationships during task planning. Intuitively, a construal “picks out” details in a task to consider.

The problem of selecting a task construal is formulated as an approximately optimal trade-off between cognitive cost and task performance, or behavioral utility. This gives a normative, resource-rational account of planning (@2023hoRational).

The key idea is to treat model and policy selection as a two-level optimization process: an outer loop selects a construal, or simplified model of cause-effect relationships, by optimizing the value of representation over task construals; this is then used by an inner loop planning algorithm to compute the optimal policy.

Preliminaries: MDP models of sequential decision-making

MDP model of sequential decision-making tasks

A task representation $T$ consists of the following data: a state space $S$ with initial state $s_{0} \in S$ ; an action space $A$ ; a transition function $P : S \times A \times S \to [0, 1]$ ; and a utility function $U : S \to R$ . The value of a plan $π : S \times A \to [0, 1]$ is defined for all states $s \in S$ by the expected cumulative utility of using that plan:
$V_{π} (s) = U (s) + a \sum π (a ∣ s) s^{'} \sum P (s^{'} ∣ s, a) V_{π} (s^{'}) .$

Value-guided task construals

Construal

Suppose an agent has $N$ primitive cause-effect relationships assigning probabilities to state, action, and next-state transitions
$ϕ_{i} : S \times A \times S \to [0, 1], i = 1, \dots, N,$
where each $ϕ_{i} (s ’∣ s, a)$ is a potential function representing the local effect of taking some action. A construal $c \subseteq {ϕ_{1}, \dots, ϕ_{N}}$ is a subset of primitive cause-effect relationships that produces a task construal $T_{c}$ that shares the same state space, action space, and utility function with $T$ , but has a construed transition function
$P_{c} (s^{'} ∣ s, a) \propto ϕ_{i} \in c \prod ϕ_{i} (s^{'} ∣ s, a) .$

Behavioral utility, value of representation

Given a decision-maker with task construal $T_{c}$ , the behavioral utility of the computed optimal plan $π_{c}$ when starting at state $s_{0}$ is defined as its performance when interacting with actual transition dynamics $P$ :
$U (π_{c}) = U (s_{0}) + a \sum π_{c} (a ∣ s_{0}) s^{'} \sum P (s^{'} ∣ s_{0}, a) V_{π_{c}} (s^{'}) .$
The value of representation for the construal $c$ is
$VOR (c) = U (π_{c}) - ∣ c ∣,$
where $∣ c ∣$ is the cognitive cost defined as the cardinality of $c$ .

Model implementation

Given a value of representation function $VOR$ that assigns a value to each construal, decision-makers are modeled as selecting a construal according to a softmax decision rule: $P (c) \propto exp (α^{- 1} VOR (c),$ where $α > 0$ is temperature parameter.
The process of revisiting and modifying construals at each stage of planning is represented as a sequential decision-making problem, the construal modification Markov decision process.

Construal modification Markov decision process

Given a set of $N$ cause-effect relationships, let $C = P ({ϕ_{1}, \dots, ϕ_{N}})$ be the set of all possible construals (i.e., the powerset of cause-effect relationships). The construal modification Markov decision process has state space $S \times C$ and an action space corresponding to possible next construals $c ’ \in C$ . After selecting a new construal $c ’$ , the probability of transitioning from task state $s$ to $s ’$ is given by
$P (s^{'} ∣ s, c^{'}) = a \sum π_{c^{'}} (a ∣ s) P (s^{'} ∣ s, a);$
that is, first calculating a joint distribution using the actual transition function $P (s ’∣ s, a)$ and plan $π_{c ’} (a ∣ s)$ , then marginalizing over task actions $a$ .

The optimal construal modification value function is defined for all $s \in S$ and $c \in C$ by
$V (s, c) = U (s) + c^{'} max {s^{'} \sum P (s^{'} ∣ s, c^{'}) V (s^{'}, c^{'}) - ∣ c^{'} - c ∣},$
where $∣ c ’ - c ∣$ is the number of additional cause-effect relationships in $c ’$ compared to $c$ .

BONNIE'S NOTES

Table of Contents

(Model) Value-guided task construals, after Ho et al. (2022)

Overview

Preliminaries: MDP models of sequential decision-making

Value-guided task construals

Model implementation

Graph View