Overview
@2022hoPeople introduce value-guided task construals to model the process of adaptively selecting simplified representations of cause-effect relationships during task planning. Intuitively, a construal “picks out” details in a task to consider.
The problem of selecting a task construal is formulated as an approximately optimal trade-off between cognitive cost and task performance, or behavioral utility. This gives a normative, resource-rational account of planning (@2023hoRational).
The key idea is to treat model and policy selection as a two-level optimization process: an outer loop selects a construal, or simplified model of cause-effect relationships, by optimizing the value of representation over task construals; this is then used by an inner loop planning algorithm to compute the optimal policy.
Preliminaries: MDP models of sequential decision-making
MDP model of sequential decision-making tasks
A task representation consists of the following data: a state space with initial state ; an action space ; a transition function ; and a utility function . The value of a plan is defined for all states by the expected cumulative utility of using that plan:
Value-guided task construals
Construal
Suppose an agent has primitive cause-effect relationships assigning probabilities to state, action, and next-state transitions
where each is a potential function representing the local effect of taking some action. A construal is a subset of primitive cause-effect relationships that produces a task construal that shares the same state space, action space, and utility function with , but has a construed transition function
Behavioral utility, value of representation
Given a decision-maker with task construal , the behavioral utility of the computed optimal plan when starting at state is defined as its performance when interacting with actual transition dynamics :
The value of representation for the construal is
where is the cognitive cost defined as the cardinality of .
Model implementation
- Given a value of representation function that assigns a value to each construal, decision-makers are modeled as selecting a construal according to a softmax decision rule: where is temperature parameter.
- The process of revisiting and modifying construals at each stage of planning is represented as a sequential decision-making problem, the construal modification Markov decision process.
Construal modification Markov decision process
Given a set of cause-effect relationships, let be the set of all possible construals (i.e., the powerset of cause-effect relationships). The construal modification Markov decision process has state space and an action space corresponding to possible next construals . After selecting a new construal , the probability of transitioning from task state to is given by
that is, first calculating a joint distribution using the actual transition function and plan , then marginalizing over task actions .
The optimal construal modification value function is defined for all and by
where is the number of additional cause-effect relationships in compared to .