Overview

The Rational Speech Acts (RSA) framework aims to model how people make pragmatic inferences beyond the literal semantics of language.

RSA treats speakers as boundedly rational agents—i.e., agents whose decision-making is optimal with respect to both environmental utility and internal (cognitive) constraints—who optimize the trade-off between informativeness and utterance costs. Speakers make decisions by simulating a listener, selecting the utterance from a fixed set that is most likely to induce beliefs about the true world state.

Different listener models correspond to different speaker utilities:

Listener type	Description	Implementation	Speaker utility (of an utterance)
Literal $L_{0}$	Interpret utterances according to “lexical meanings” $δ_{[u] (w)}$ : $P_{L_{0}} (w ∥ u) \propto δ_{[u] (w)} P (w) .$	Uniform distribution over possible world states.	Listener’s gain about the true world state after hearing the utterance: $U (u ∥ w) = lo g P_{L_{0}} (w ∥ u) .$
Pragmatic $L_{1}$	Formalize Gricean pragmatics and account for the relevance of an utterance.	Recursively embed a speaker model, which in turn embeds a literal literal listener: $P_{L_{1}} (w ∥ u) \propto P_{S} (u ∥ w) P (w) .$	Open research question.

Basic framework

The pragmatic listener’s inferred world state $w$ is conditioned on speaker choosing utterance $u$ : $P_{L} (w ∣ u) \propto P_{S} (u ∣ w) P (w) .$
The speaker $S$ chooses and utterance $u$ in proportion to the expected utility gain (i.e., approximately rationally), where $α$ represents the extent to which utility is maximized (i.e., how rational the speaker will be): $P_{S} (u ∣ w) \propto exp (α \cdot U (u, w)) .$
The speaker utility is determined by how certain a literal listener $L_{0}$ becomes about world state $w$ after hearing utterance $u$ : $U (u; w) = lo g P_{L_{0}} (w ∣ u) .$
The literal listener updates beliefs about the world under the assumption that the utterance $u$ directly corresponds to a true semantic denotation $[u]$ : $P_{L_{0}} (w ∣ u) \propto δ_{[u] (w)} P (w) .$

Variants

Continuous semantics, after @2020degen

Integrating epistemic utility with decision-theoretic in speaker utilities, after @2024sumersReconciling

The truthfulness, or epistemic utility, of an utterance is defined as a penalty on false utterances, with threshold determined by the speaker’s softmax optimality $α$ (around $α = 1$ is a preference for true utterances, while $α \to \infty$ “recovers a more typical RSA constraint to true utterances):

U_{t r u t h f u l n ess} (u ∣ q) = {10 if δ_{[u] (w)} = 1 if δ_{[u] (w)} = 0 .

The listener’s posterior is the expected reward for taking an action $a \in A \subseteq A$ . Reward is a scalar value $R : A \times W \to R$ obtained by conditioning the learner’s beliefs about the world state on the utterance, then marginalizing over worlds:

R_{L} (a, u) = w \in W \sum R (a, w) P_{L} (w ∣ u), P_{L} (w ∣ u) \propto δ_{[u] (w)} P (w) .

The listener’s decision policy is a softmax over beliefs that chooses from actions $A \subseteq A$ according to their expected utility:

π_{L} (a ∣ u, A) \propto exp (β \cdot R_{L} (a, u)) .

The relevance, or decision-theoretic utility, of an utterance is defined as the expected utility of the listener’s decision policy after hearing it; informally, more relevant utterances induce beliefs that improve listener decision-making:

U_{R e l e v an ce} (u ∣ w, A) = a \in A \sum π_{L} (a ∣ u, A) R (a, w) .

The speaker utility is a convex combination of truthfulness (epistemic accuracy) and relevance (decision-theoretic utility) with a cost term:

U_{C o mbin e d} (u ∣ w, A) = λ \cdot U_{R e l e v an ce} + (1 - λ) \cdot U_{T r u t h f u l n ess} + C (u) .

BONNIE'S NOTES

Table of Contents

(Model) The Rational Speech Acts framework, after Frank and Goodman (2016)

Overview

Basic framework

Variants

Continuous semantics, after @2020degen

Integrating epistemic utility with decision-theoretic in speaker utilities, after @2024sumersReconciling

Graph View