Overview
The Rational Speech Acts (RSA) framework aims to model how people make pragmatic inferences beyond the literal semantics of language.
RSA treats speakers as boundedly rational agents—i.e., agents whose decision-making is optimal with respect to both environmental utility and internal (cognitive) constraints—who optimize the trade-off between informativeness and utterance costs. Speakers make decisions by simulating a listener, selecting the utterance from a fixed set that is most likely to induce beliefs about the true world state.
Different listener models correspond to different speaker utilities:
Listener type | Description | Implementation | Speaker utility (of an utterance) |
---|---|---|---|
Literal | Interpret utterances according to “lexical meanings” : | Uniform distribution over possible world states. | Listener’s gain about the true world state after hearing the utterance: |
Pragmatic | Formalize Gricean pragmatics and account for the relevance of an utterance. | Recursively embed a speaker model, which in turn embeds a literal literal listener: | Open research question. |
Basic framework
- The pragmatic listener’s inferred world state is conditioned on speaker choosing utterance :
- The speaker chooses and utterance in proportion to the expected utility gain (i.e., approximately rationally), where represents the extent to which utility is maximized (i.e., how rational the speaker will be):
- The speaker utility is determined by how certain a literal listener becomes about world state after hearing utterance :
- The literal listener updates beliefs about the world under the assumption that the utterance directly corresponds to a true semantic denotation :
Variants
Continuous semantics, after @2020degen
Integrating epistemic utility with decision-theoretic in speaker utilities, after @2024sumersReconciling
- The truthfulness, or epistemic utility, of an utterance is defined as a penalty on false utterances, with threshold determined by the speaker’s softmax optimality (around is a preference for true utterances, while “recovers a more typical RSA constraint to true utterances):
- The listener’s posterior is the expected reward for taking an action . Reward is a scalar value obtained by conditioning the learner’s beliefs about the world state on the utterance, then marginalizing over worlds:
- The listener’s decision policy is a softmax over beliefs that chooses from actions according to their expected utility:
- The relevance, or decision-theoretic utility, of an utterance is defined as the expected utility of the listener’s decision policy after hearing it; informally, more relevant utterances induce beliefs that improve listener decision-making:
- The speaker utility is a convex combination of truthfulness (epistemic accuracy) and relevance (decision-theoretic utility) with a cost term: