Natural Rationality | decision-making in the economy of nature


A short primer on dopamine and TD learning

Many researches indicates that dopaminergic systems have an important role in decision-making, and that their activity can be precisely formulated by TD algorithms. Here is a brief description, from a forthcoming paper of mine:

According to many findings, utility computation is realized by dopaminergic systems, a network of structures in ‘older’ brain areas highly involved in motivation and valuation (Berridge, 2003; Montague & Berns, 2002). Neuroscience revealed their role in working memory, motivation, learning, decision-making planning and motor control (Morris et al., 2006). Dopaminergic neurons are activated by stimuli related to primary rewards (juice, food) and stimuli that recruit attention (new or intense). It is important to note that they do not encode hedonic experiences, but predictions of expected reward. Activations of dopaminergic neurons respond selectively to prediction errors: the presence of unexpected reward or the absence of expected reward. In other words, they detect the discrepancies between predicted and experienced utility. Moreover, dopaminergic neurons learns from their mistake: from these prediction errors they learn to predict future rewarding events and can then bias action choice. Computational neuroscience identified a class of reinforcement learning algorithms that mirror the activity of dopaminergic activity (Niv et al., 2005; Suri & Schultz, 2001). It is suggested that dopaminergic neurons broadcast in different brain areas a reward-predictio error signal similar to those displayed by temporal difference (TD) algorithms developed by computer scientists (Sutton & Barto, 1987, 1998). This dopaminergic mechanisms use sensory inputs to predict future rewards. The difference between successive value predictions is computed and constitutes an error signal. The model then updates a value function (the function that maps state-action pairs to numerical values) according to the prediction error. Thus TD-learning algorithms are neural mechanisms of decision-making under uncertainty implemented in dopaminergic systems. They are not involved only in basic reward prediction, such as food, but also abstract stimuli like art, branded good, love or trust (Montague et al., 2006, p. 420). From the mouthwatering vision of a filet mignon in a red wine sauce to the intellectual contemplation of Any Warhol’s Pop Art Brillo Boxes, the valuation mechanisms are essentially the same

Berridge, K. C. (2003). Irrational pursuits: Hyper-incentives from a visceral brain. In I. Brocas & J. Carrillo (Eds.), The psychology of economic decisions (pp. 17-40). Oxford: Oxford University Press.
Montague, P. R., & Berns, G. S. (2002). Neural economics and the biological substrates of valuation. Neuron, 36(2), 265-284.
Montague, P. R., King-Casas, B., & Cohen, J. D. (2006). Imaging valuation models in human choice. Annu Rev Neurosci, 29, 417-448.
Morris, G., Nevet, A., Arkadir, D., Vaadia, E., & Bergman, H. (2006). Midbrain dopamine neurons encode decisions for future action. Nat Neurosci, 9(8), 1057-1063.
Niv, Y., Duff, M. O., & Dayan, P. (2005). Dopamine, uncertainty and td learning. Behav Brain Funct, 1, 6.
Suri, R. E., & Schultz, W. (2001). Temporal difference model reproduces anticipatory neural activity. Neural Comput, 13(4), 841-862.
Sutton, R. S., & Barto, A. G. (1987). A temporal-difference model of classical conditioning. Paper presented at the Ninth Annual Conference of the Cognitive Science Society.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning : An introduction. Cambridge, Mass.: MIT Press.