Natural Rationality | decision-making in the economy of nature


The brain and reinforcement learning

Since about 10 years, a lot of data suggest that neural mechanisms of decision-making look like reinforcement learning (RL) algorithms. According to a standard texbook,

Reinforcement learning is learning what to do--how to map situations to actions--so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. These two characteristics--trial-and-error search and delayed reward--are the two most important distinguishing features of reinforcement learning.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning : An introduction. Cambridge, Mass.: MIT Press.

Instead of starting with a priori information and built-in policies, the agent learn which policies to follow and which actions are worth doing. On important features of RL is the prediction error signals, that is, the indication that an unexpected reward occured or that a expected reward did not occured. Many neuroscientists think that midbrain dopaminergic neurons signal prediction errors and that we make decisions according to RL algorithms (see a summary in Read Montague's new book Why choose this book? : How we make decisions). A new study by Cohen and Ranganath add some support to this "RL hypothesis". They tested the hypothesis that the "flexibility to adapt decision strategies based on recent outcomes (...) emerges through a reinforcement learning process, in which reward prediction errors are used dynamically to adjust representations of decision options". Subjects had to play a dynamic game in which optimal policies imply changing decision strategies according to their reward (no fixed strategy would be dominant). A RL algorithm predicted which decisions subjects would make if they would follow a RL algorithms, while scalp-recorded event-related brain potentials (ERP) registered brain activity related to prediction errors. The RL model matched the subject behavior and the ERP matched the signaling of predictions errors. Thus the idea the brain is a RL engine migh be a good framework to understand its functions.