Summary

Episodic reinforcement learning and meta-reinforcement learning are two methods of enabling fast deep reinforcement learning. Episodic RL resolves slowness due to incremental parameter adjustment, and meta-RL resolves slowness due to weak inductive bias.

Episodic meta-RL uses an episodic memory system to reinstate patterns of activity in a recurrent neural network that implements meta-learning processes.


Key terms

  • Neural network = a learnable set of layered weights and biases, which are used to process inputs and produce outputs.
    • Embedding = a learned internal representation of an observation, which is contained in a neural network layer.
    • Deep neural network = a neural network with multiple hidden layers between input and output layers.
    • Convolutional neural network = a type of deep neural network used for processing structured arrays of data; CNNs are frequently used for computer vision and natural-language processing.
    • Recurrent neural network = a neural network that runs as an “inner loop” for each step in a sequence, passing patterns of activation between steps.
  • Non-parametric model = a machine learning model where the number of parameters is not fixed, but grows as more data is provided.
  • Reinforcement learning = a process for learning from rewards and punishments instead of explicit instructions.
    • Sample efficiency = the amount of data required for a learning system to reach a target performance level.
    • Variance = the range of patterns that can be mastered by a learning procedure.
  • Behavioral policy = a mapping from states to actions, often implemented as a parameterized function for richer environments; reinforcement learning seeks a behavioral policy that will maximize long-term reward.
    • Catastrophic interference = when large parameter adjustments for behavioral policies cause effects of previous learning to be overwritten.
  • Inductive bias = initial assumptions about a pattern to be learned, which are used in a learning procedure.
    • Bias-variance trade-off = when stronger inductive biases lead to greater sample efficiency, but decreased variance.
  • Meta-learning = informally called “learning to learn”; when one learning system adjusts the operation of a second learning system (i.e., leveraging past learning) to increase speed and efficiency of new learning by the second system.