Natural Rationality | decision-making in the economy of nature
Showing posts with label dopamine. Show all posts
Showing posts with label dopamine. Show all posts

11/30/07

Values and regrets

Regret, from a decision-making point of view, is a countefactual post-hoc valuation of a decision. Regret and rejoicing are two varieties of remembered utility. Without a doubt, neuroeconomics can be informative about the neural mechanisms of regrets. As I once argued, however, we need a neuroeconomic account of valuation with clear distinction between differnt processes, mechanisms and functions, and between the different contributions of neural structures. A nice example:

In this paper:

it is said that "a cortical network, consisting of the medial orbitofrontal cortex, left superior frontal cortex, right angular gyrus, and left thalamus, correlates with the degree of regret. A different network, including the rostral anterior cingulate, left hippocampus, left ventral striatum, and brain stem/midbrain correlated with rejoice."

But, in another paper, regret, or its computational cousin "fictive learning signals", is now an outcome of midbrain dopaminergic systems:

How to fit it all together? I am not sure yet but here are 2 possibilities:

1-a "same-level" explanation: dopaminergic systems and cortical networks contribute together to the feeling, emotions, and processing of regret/rejoicing. They are two faces of the same coin.

2-a "different-level" explanation: dopaminergic systems and cortical networks are two layers in a hierarchical multi-level architecture (that may have other level, e.g. molecular, etc.).

suggestions, ideas?



9/13/07

Cognitive Control and Dopamine: A Very Brief Intro

In certain situations, learned routines are not enough. When situations are too uncommon, dangerous and difficult or when they require the overcoming of a habitual response, decisions must be guided by representations. Acting upon an internal representation is referred to, in cognitive science, as cognitive control or executive function[1]. The agent is lead by a representation of a goal and will robustly readjust its behavior in order to maintain the pursuit of a goal. The behavior is then controlled ‘top-down’, not ‘bottom-up’. In the Stroop task, for instance, subject must identify the color of written words such as ‘red’, ‘blue' or ‘yellow’ printed in different colors (the word and the ink color do not match). The written word, however, primes the subject to focus on the meaning of the word instead of focusing on the ink’s color. If, for instance, the word “red” is written in yellow ink, subjects will utter “red” more readily than they say “yellow”. There is a cognitive conflict between the semantic priming induced by the word and the imperative to focus on the ink’s color. In this task, cognitive control mechanisms ought to give priority to goals in working memory (naming ink color) over external affordances (semantic priming). An extreme lack of cognitive control is exemplified in subjects who suffer from “environmental dependency syndrome”[2]: they will spontaneously do what their environment indicates of affords them: for instance, they will sit on a chair whenever they see one, or undress and get into a bed whenever they are in presence of a bed (even if it’s not in a bedroom).

Cognitive control is thought to happen mostly in the prefrontal cortex (PFC),[3] an area strongly innervated by midbrain dopaminergic fibers. Prefrontal areas activity is associated with maintenance and updating of cognitive representations of goals. Moreover, impairment of these areas results in executive control deficits (such as the environmental dependency syndrome). Since working memory is limited, however, agents cannot hold everything in their prefrontal areas. Thus the brain faces a tradeoff between attending to environmental stimuli (that may reveal rewards or danger, for instance) and maintaining representation of goals, viz. the tradeoff between rapid updating and active maintenance [4]. Efficiency requires brains to focus on relevant information and again, dopaminergic systems are involved in this process. According to many researches[5], dopaminergic activity implements a ‘gating’ mechanism, by which the PFC alternates between rapid updating and active maintenance. A higher level of dopamine in prefrontal area signals the need to rapidly update goals in working memory (rapid updating: ‘opening the gate’), while a lower level induces resistance to afferent signals and thus a focus on represented goals (active maintenance: ‘shutting the gate’). Hence dopaminergic neurons select which information (goal representation or external environment) is worth paying attention to. This mechanisms is thought to be implemented by different dopamine receptors, the D1 and D2 being responsive to different dopamine concentration (D1-low, D2-high):


Fig. 1 (From O'Reilly, 2006). Dopamine-based gating mechanism that emerges from the detailed biological model of Durstewitz, Seamans, and colleagues. The opening of the gate occurs in the dopamine D2-receptor–dominated state (State 1), in which any existing active maintenance is destabilized and the system is more responsive to inputs. The closing of the gate occurs in the D1-receptor–dominated state (State 2), which stabilizes the strongest activation pattern for robust active maintenance. D2 receptors are located synaptically and require high concentrations of dopamine and are therefore activated only during phasic dopamine bursts, which thus trigger rapid updating. D1 receptors are extrasynaptic and respond to lower concentrations, so robust maintenance is the default state of the system with normal tonic levels of dopamine firing.

Here is a neurobiological description of the phenomena, with neuroanatomical details:



Fig. 2. (From O'Reilly, 2006). Dynamic gating produced by disinhibitory circuits through the basal ganglia and frontal cortex/PFC (one of multiple parallel circuits shown). (A) In the base state (no striatum activity) and when NoGo (indirect pathway) striatum neurons are firing more than Go, the SNr (substantia nigra pars reticulata) is tonically active and inhibits excitatory loops through the basal ganglia and PFC through the thalamus. This corresponds to the gate being closed, and PFC continues to robustly maintain ongoing activity (which does not match the activity pattern in the posterior cortex, as indicated). (B) When direct pathway Go neurons in striatum fire, they inhibit the SNr and thus disinhibit the excitatory loops through the thalamus and the frontal cortex, producing a gating-like modulation that triggers the update of working memory representations in prefrontal cortex. This corresponds to the gate being open.

Hence it is interesting to note that dopaminergic neurons are involved in basic motivation and reinforcement, and in more abstract operations such as cognitive control.



Notes and references
  1. (Norman & Shallice, 1980; Shallice, 1988)
  2. (Lhermitte, 1986)
  3. (Duncan, 1986; Koechlin, Ody, & Kouneiher, 2003; Miller & Cohen, 2001; O’Reilly, 2006)
  4. (O’Reilly, 2006)
  5. (Montague, Hyman, & Cohen, 2004; O'Donnell, 2003; O’Reilly, 2006)

  • Durstewitz, D., Seamans, J. K., & Sejnowski, T. J. (2000). Dopamine-Mediated Stabilization of Delay-Period Activity in a Network Model of Prefrontal Cortex. Journal of Neurophysiology, 83(3), 1733-1750.
  • Duncan, J. (1986). Disorganization of behavior after frontal lobe damage. Cognitive Neuropsychology, 3(3), 271-290.
  • Koechlin, E., Ody, C., & Kouneiher, F. (2003). The Architecture of Cognitive Control in the Human Prefrontal Cortex. Science, 302(5648), 1181-1185.
  • Lhermitte, F. (1986). Human autonomy and the frontal lobes. Part 11: Patient behavior in complex and social situations: The “environmental dependency syndrome.” Annals of Neurology, 19(4), 335–343.
  • Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual Review of Neuroscience, 24(1), 167-202.
  • Montague, P. R., Hyman, S. E., & Cohen, J. D. (2004). Computational roles for dopamine in behavioural control. Nature, 431(7010), 760.
  • Norman, D. A., & Shallice, T. (1980). Attention to Action: Willed and Automatic Control of Behavior: Center for Human Information Processing, University of California, San Diego.
  • O'Donnell, P. (2003). Dopamine gating of forebrain neural ensembles. European Journal of Neuroscience, 17(3), 429-435.
  • O’Reilly, R. C. (2006). Biologically Based Computational Models of High-Level Cognition Science, 314, 91-94.
  • Shallice, T. (1988). From neuropsychology to mental structure. Cambridge [England] ; New York: Cambridge University Press.



8/1/07

A basic mode of behavior: a review of reinforcement learning, from a computational and biological point of view.

The Journal Frontiers of Interdisciplinary Research in the Life Sciences (HFSP Publishing) made its first issue freely available online. The Journal specializes in "innovative interdisciplinary research at the interface between biology and the physical sciences." An excellent paper (complete, clear, exhaustive) by Kenji Doya presents a state-of-the-art review of reinforcement learning, both as a computational theory (the procedures) and a biological mechanism (neural activity). Exactly what the title announces: Reinforcement learning: Computational theory and biological mechanisms. The paper covers research in neuroscience, AI, computer science, robotics, neuroeconomics, psychology. See this nice schema of reinforcement learning in the brain:



(From the paper:) A schematic model of implementation of reinforcement learning in the cortico-basal ganglia circuit (Doya, 1999, 2000). Based on the state representation in the cortex, the striatum learns state and action value functions. The state value coding striatal neurons project to dopamine neurons, which sends the TD signal back to the striatum. The outputs of action value coding striatal neurons channel through the pallidum and the thalamus, where stochastic action selection may be realized

This stuff is exactly what a theory of natural rationality (and economics tout court): plausible, tractable, and real computational mechanism grounded in neurobiology. As Selten once said, speaking of reinforcement learning:

a theory of bounded rationality cannot avoid this basic mode of behavior (Selten, 2001, p. 16)


References



7/26/07

Special issues of NYAS on biological decision-making

The may issue of the Annals of the New York Academy of Sciences is devoted to Reward and Decision Making in Corticobasal Ganglia Networks. Many big names in decision neuroscience (Berns, Knutson, Delgado, etc.) contributed.


Introduction. Current Trends in Decision Making
Bernard W Balleine, Kenji Doya, John O'Doherty, Masamichi Sakagami

Learning about Multiple Attributes of Reward in Pavlovian Conditioning
ANDREW R DELAMATER, STEPHEN OAKESHOTT

Should I Stay or Should I Go?. Transformation of Time-Discounted Rewards in Orbitofrontal Cortex and Associated Brain Circuits
MATTHEW R ROESCH, DONNA J CALU, KATHRYN A BURKE, GEOFFREY SCHOENBAUM

Model-Based fMRI and Its Application to Reward Learning and Decision Making
JOHN P O'DOHERTY, ALAN HAMPTON, HACKJIN KIM

Splitting the Difference. How Does the Brain Code Reward Episodes?
BRIAN KNUTSON, G. ELLIOTT WIMMER

Reward-Related Responses in the Human Striatum
MAURICIO R DELGADO

Integration of Cognitive and Motivational Information in the Primate Lateral Prefrontal Cortex
MASAMICHI SAKAGAMI, MASATAKA WATANABE

Mechanisms of Reinforcement Learning and Decision Making in the Primate Dorsolateral Prefrontal Cortex
DAEYEOL LEE, HYOJUNG SEO


Resisting the Power of Temptations. The Right Prefrontal Cortex and Self-Control
DARIA KNOCH, ERNST FEHR

Adding Prediction Risk to the Theory of Reward Learning
KERSTIN PREUSCHOFF, PETER BOSSAERTS

Still at the Choice-Point. Action Selection and Initiation in Instrumental Conditioning
BERNARD W BALLEINE, SEAN B OSTLUND

Plastic Corticostriatal Circuits for Action Learning. What's Dopamine Got to Do with It?
RUI M COSTA

Striatal Contributions to Reward and Decision Making. Making Sense of Regional Variations in a Reiterated Processing Matrix
JEFFERY R WICKENS, CHRISTOPHER S BUDD, BRIAN I HYLAND, GORDON W ARBUTHNOTT

Multiple Representations of Belief States and Action Values in Corticobasal Ganglia Loops
KAZUYUKI SAMEJIMA, KENJI DOYA

Basal Ganglia Mechanisms of Reward-Oriented Eye Movement
OKIHIDE HIKOSAKA

Contextual Control of Choice Performance. Behavioral, Neurobiological, and Neurochemical Influences
JOSEPHINE E HADDON, SIMON KILLCROSS

A "Good Parent" Function of Dopamine. Transient Modulation of Learning and Performance during Early Stages of Training
JON C HORVITZ, WON YUNG CHOI, CECILE MORVAN, YANIV EYNY, PETER D BALSAM

Serotonin and the Evaluation of Future Rewards. Theory, Experiments, and Possible Neural Mechanisms
NICOLAS SCHWEIGHOFER, SAORI C TANAKA, KENJI DOYA

Receptor Theory and Biological Constraints on Value
GREGORY S BERNS, C. MONICA CAPRA, CHARLES NOUSSAIR

Reward Prediction Error Computation in the Pedunculopontine Tegmental Nucleus Neurons
YASUSHI KOBAYASHI, KEN-ICHI OKADA

A Computational Model of Craving and Obsession
A. DAVID REDISH, ADAM JOHNSON

Calculating the Cost of Acting in Frontal Cortex
MARK E WALTON, PETER H RUDEBECK, DAVID M BANNERMAN, MATTHEW F. S RUSHWORTH

Cost, Benefit, Tonic, Phasic. What Do Response Rates Tell Us about Dopamine and Motivation?
YAEL NIV



7/10/07

Neuropsychopharmacology textbook: this is your brain on drugs

I discovered (thanks to Mind Hacks) that the American College of Neuropsychopharmacology put online a text book on, well, Neuropsychopharmacology. An enormous source of information on how drugs affect the brain. Here is (from p. 120) a schema of dopaminergic systems:


(click to enlarge)


Visit the link below to read the online text book:



6/29/07

Fruit Flies Neuroeconomics and ecological rationality.

In the last edition of Science. Zhang et al. examine the decision-making mechanisms of fruit flies (Drosophila). They used mutant flies to see if dopaminergic systems are necessary for saliency, or value-based, decision-making. Value-based decision-making is contrasted with perceptual, or simple, decision-making. In the former, decision in made only by integrating sensory cues: in the latter, some value is added to available options. Values are especially important when there are conflicting evidence of ambiguous stimuli. 


As many researches show, dopaminergic (DA) systems are an important--maybe the most important--valuation mechanism. They link stimuli to expected value. Flies, bees, monkeys and humans all rely on DA neuromodulation to make decisions. Fruit flies provides an interesting opportunity for neuroeconomics: it allows scientists to create genetic mutants (in this case, flies whose DA system shuts off over 30° C.) and analyse their behavior. Zhang and its collaborators discovered that flies without DA activity are able to make decision when they face a well-known situation, where it is easy to choose, but are inefficient when they face conflicting stimuli. Hence, when valuation is needed, DA is required: no DA, no valuation, no value-based decision-making. 

 Conceptually, the study shows how the concept of value-based and perceptual decision-making can be separated. The 'simple heuristics' and ecological rationality program made a strong case for simple decision-making: you just "see" the best option, e.g. which city is bigger:  Munich or Dortmund? Since the size of a city is correlated with its exposition in media, it is easy to answer by using a simple heuristics (choose the most familiar). In this case, there is no need to evaluate options. But when you have to choose where you want to live, values are important. In this case you heed preference ranking, and preferences seems inherently tied to DA activity.  


 

Reference :

Zhang, K., Guo, J. Z., Peng, Y., Xi, W., & Guo, A. (2007). Dopamine-Mushroom Body Circuit Regulates Saliency-Based Decision-Making in Drosophila. Science, 316(5833), 1901-1904.



4/22/07

marginal utility, value and the brain

Economics assumes the principle of diminishing marginal utility, i.e. the utility of a good increases more and more slowly as the quantity consumed increases (Wikipedia). Mathematically, it means that the value of a monetary gain is not a linear function of the monetary value. Before Bernouilli St-Petersburg Paradox (1738]1954), the expected value of a possible gamble was construed as the product of the objective (for instance, monetary) value of its outcomes and its probability. Suppose, then, a gambler is offered the following lottery:

A fair coin is tossed. If the outcome is heads, the lottery ends and you win 2$. If the outcome is tail, toss the coin again. It the outcome is heads, the lottery ends and you win 4$, etc. If the nth outcome is heads, you win 2n.

Summing the products of probability and value leads to an infinite expected value:

(0.5 x 2) + (0.25 x 4) + (0.125 x 8)…. =
1+1+1 …

After 30 tosses, the gambler could win more than 1 billion $. How much would it be worth paying for a ticket? If a rational agent maximizes expected value, he or she must be willing to buy a ticket for this lottery at any finite price, considering that the expected value of this prospect if infinite. But, as Hacking pointed out, “few of us would pay even $25 to enter such a game” (Hacking, 1980). When Bernoulli offered scholars in St-Petersburg to play this lottery, nobody was interested in it. Bernoulli concluded that the utility function is not linear, but logarithmic. Hence the subjective value of 10$ is different, depending whether you are Bill Gates or a homeless. Bernoulli’s discussion of the St-Petersburg paradox is often considered as one of the first economic experiment (Roth, 1993, p. 3).

A new study in neuroeconomics (Tobler et al.) indicates that the brain's valuation mechanisms follow this principle. Subjects in the experiments had to learn whether a particular abstract shape--shown on a computer screen--predicts a monetary reward (a picture of a 20 pence coin) or not (scrambled picture of the coin). If the utility of money has a diminishing marginal value, then money should be more important for poorer people than for richer. "More important" meaning that the former would learn reward prediction partterns faster and would display more activity in reward-related area. Bingo! That's exactly what happened. Midbain dopaminergic regions were more solicited in the poorer. The valuation mechanisms obey diminishing marginal utility.

This suggest that midbain dopaminergic systems (about which I blogged earlier; see also references at the end of this post) are the seat of our natural rationality, or at least one of its major component. These systems compute utility, stimulate motivation and attention, send reward-prediction error signals, learn from these signals and devise behavioral policies. They do not encode anticipated or experienced utility (other zones are recruited for these: the amygdala and nucleus accumbens for experienced utility, the OFC for anticipated utility, etc.), but decision utility, the cost/benefits analysis of a possible decision.


References

  • Bernoulli, D. (1738]1954). Exposition of a new theory on the measurement of risk. Econometrica, 22, 23-36.
  • Hacking, I. (1980). Strange expectations. Philosophy of Science, 47, 562-567.
  • Roth, A. E. (1993). On the early history of experimental economics. Journal of the History of Economic Thought, 15, 184-209.
  • Tobler, P. N., Fletcher, P. C., Bullmore, E. T., & Schultz, W. (2007). Learning-related human brain activations reflecting individual finances. Neuron, 54(1), 167-175.
On dopaminergic systems:
  • Ahmed, S. H. (2004). Neuroscience. Addiction as compulsive reward prediction. Science, 306(5703), 1901-1902.
  • Bayer, H. M., & Glimcher, P. W. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47(1), 129.
  • Berridge, K. C. (2003). Pleasures of the brain. Brain and Cognition, 52(1), 106.
  • Berridge, K. C., & Robinson, T. E. (1998). What is the role of dopamine in reward: Hedonic impact, reward learning, or incentive salience? Brain Res Brain Res Rev, 28(3), 309-369.
  • Cohen, J. D., & Blum, K. I. (2002). Reward and decision. Neuron, 36(2), 193-198.
  • Daw, N. D., & Doya, K. (2006). The computational neurobiology of learning and reward. Curr Opin Neurobiol, 16(2), 199-204.
  • Daw, N. D., & Touretzky, D. S. (2002). Long-term reward prediction in td models of the dopamine system. Neural Comput, 14(11), 2567-2583.
  • Dayan, P., & Balleine, B. W. (2002). Reward, motivation, and reinforcement learning. Neuron, 36(2), 285-298.
  • Di Chiara, G., & Bassareo, V. (2007). Reward system and addiction: What dopamine does and doesn't do. Curr Opin Pharmacol, 7(1), 69-76.
  • Egelman, D. M., Person, C., & Montague, P. R. (1998). A computational role for dopamine delivery in human decision-making. J Cogn Neurosci, 10(5), 623-630.
  • Floresco, S. B., & Magyar, O. (2006). Mesocortical dopamine modulation of executive functions: Beyond working memory. Psychopharmacology (Berl), 188(4), 567-585.
  • Frank, M. J., Seeberger, L. C., & O'Reilly, R. C. (2004). By carrot or by stick: Cognitive reinforcement learning in parkinsonism. Science, 306(5703), 1940-1943.
  • Joel, D., Niv, Y., & Ruppin, E. (2002). Actor-critic models of the basal ganglia: New anatomical and computational perspectives. Neural Netw, 15(4-6), 535-547.
  • Kakade, S., & Dayan, P. (2002). Dopamine: Generalization and bonuses. Neural Netw, 15(4-6), 549-559.
  • McCoy, A. N., & Platt, M. L. (2004). Expectations and outcomes: Decision-making in the primate brain. J Comp Physiol A Neuroethol Sens Neural Behav Physiol.
  • Montague, P. R., Hyman, S. E., & Cohen, J. D. (2004). Computational roles for dopamine in behavioural control. Nature, 431(7010), 760.
  • Morris, G., Nevet, A., Arkadir, D., Vaadia, E., & Bergman, H. (2006). Midbrain dopamine neurons encode decisions for future action. Nat Neurosci, 9(8), 1057-1063.
  • Nakahara, H., Itoh, H., Kawagoe, R., Takikawa, Y., & Hikosaka, O. (2004). Dopamine neurons can represent context-dependent prediction error. Neuron, 41(2), 269-280.
  • Nieoullon, A. (2002). Dopamine and the regulation of cognition and attention. Progress in Neurobiology, 67(1), 53.
  • Niv, Y., Daw, N. D., & Dayan, P. (2006). Choice values. Nat Neurosci, 9(8), 987-988.
  • Niv, Y., Duff, M. O., & Dayan, P. (2005). Dopamine, uncertainty and td learning. Behav Brain Funct, 1, 6.
  • Redish, A. D. (2004). Addiction as a computational process gone awry. Science, 306(5703), 1944-1947.
  • Schultz, W. (1999). The reward signal of midbrain dopamine neurons. News Physiol Sci, 14(6), 249-255.
  • Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593-1599.
  • Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annu Rev Neurosci, 23, 473-500.
  • Self, D. (2003). Neurobiology: Dopamine as chicken and egg. Nature, 422(6932), 573-574.
  • Suri, R. E. (2002). Td models of reward predictive responses in dopamine neurons. Neural Netw, 15(4-6), 523-533.
  • Tom, S. M., Fox, C. R., Trepel, C., & Poldrack, R. A. (2007). The neural basis of loss aversion in decision-making under risk. Science, 315(5811), 515-518.
  • Ungless, M. A. (2004). Dopamine: The salient issue. Trends Neurosci, 27(12), 706.



3/22/07

A short primer on dopamine and TD learning

Many researches indicates that dopaminergic systems have an important role in decision-making, and that their activity can be precisely formulated by TD algorithms. Here is a brief description, from a forthcoming paper of mine:
--

According to many findings, utility computation is realized by dopaminergic systems, a network of structures in ‘older’ brain areas highly involved in motivation and valuation (Berridge, 2003; Montague & Berns, 2002). Neuroscience revealed their role in working memory, motivation, learning, decision-making planning and motor control (Morris et al., 2006). Dopaminergic neurons are activated by stimuli related to primary rewards (juice, food) and stimuli that recruit attention (new or intense). It is important to note that they do not encode hedonic experiences, but predictions of expected reward. Activations of dopaminergic neurons respond selectively to prediction errors: the presence of unexpected reward or the absence of expected reward. In other words, they detect the discrepancies between predicted and experienced utility. Moreover, dopaminergic neurons learns from their mistake: from these prediction errors they learn to predict future rewarding events and can then bias action choice. Computational neuroscience identified a class of reinforcement learning algorithms that mirror the activity of dopaminergic activity (Niv et al., 2005; Suri & Schultz, 2001). It is suggested that dopaminergic neurons broadcast in different brain areas a reward-predictio error signal similar to those displayed by temporal difference (TD) algorithms developed by computer scientists (Sutton & Barto, 1987, 1998). This dopaminergic mechanisms use sensory inputs to predict future rewards. The difference between successive value predictions is computed and constitutes an error signal. The model then updates a value function (the function that maps state-action pairs to numerical values) according to the prediction error. Thus TD-learning algorithms are neural mechanisms of decision-making under uncertainty implemented in dopaminergic systems. They are not involved only in basic reward prediction, such as food, but also abstract stimuli like art, branded good, love or trust (Montague et al., 2006, p. 420). From the mouthwatering vision of a filet mignon in a red wine sauce to the intellectual contemplation of Any Warhol’s Pop Art Brillo Boxes, the valuation mechanisms are essentially the same

Berridge, K. C. (2003). Irrational pursuits: Hyper-incentives from a visceral brain. In I. Brocas & J. Carrillo (Eds.), The psychology of economic decisions (pp. 17-40). Oxford: Oxford University Press.
Montague, P. R., & Berns, G. S. (2002). Neural economics and the biological substrates of valuation. Neuron, 36(2), 265-284.
Montague, P. R., King-Casas, B., & Cohen, J. D. (2006). Imaging valuation models in human choice. Annu Rev Neurosci, 29, 417-448.
Morris, G., Nevet, A., Arkadir, D., Vaadia, E., & Bergman, H. (2006). Midbrain dopamine neurons encode decisions for future action. Nat Neurosci, 9(8), 1057-1063.
Niv, Y., Duff, M. O., & Dayan, P. (2005). Dopamine, uncertainty and td learning. Behav Brain Funct, 1, 6.
Suri, R. E., & Schultz, W. (2001). Temporal difference model reproduces anticipatory neural activity. Neural Comput, 13(4), 841-862.
Sutton, R. S., & Barto, A. G. (1987). A temporal-difference model of classical conditioning. Paper presented at the Ninth Annual Conference of the Cognitive Science Society.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning : An introduction. Cambridge, Mass.: MIT Press.