Temporal-Difference Reinforcement Learning with Distributed Representations
暂无分享,去创建一个
[1] P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[2] G. Madden,et al. Discounting of delayed rewards in opioid-dependent outpatients: exponential or hyperbolic discounting functions? , 1999, Experimental and clinical psychopharmacology.
[3] J. E. Mazur. Choice, delay, probability, and conditioned reinforcement , 1997 .
[4] M. Kawato,et al. Efficient reinforcement learning: computational theories, neuroscience and robotics , 2007, Current Opinion in Neurobiology.
[5] P. Glimcher,et al. Statistics of midbrain dopamine neuron spike trains in the awake primate. , 2007, Journal of neurophysiology.
[6] Peter D. Sozou,et al. On hyperbolic discounting and uncertain hazard rates , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.
[7] Adam Johnson,et al. Reconstruction of the postsubiculum head direction signal from neural ensembles , 2005, Hippocampus.
[8] R. Wightman,et al. Dopamine release is heterogeneous within microenvironments of the rat nucleus accumbens , 2007, The European journal of neuroscience.
[9] Tracey J. Shors,et al. Memory traces of trace memories: neurogenesis, synaptogenesis and awareness , 2004, Trends in Neurosciences.
[10] A. Redish,et al. Addiction as a Computational Process Gone Awry , 2004, Science.
[11] J. E. Mazur. Hyperbolic value addition and general models of animal choice. , 2001, Psychological review.
[12] K. Allen,et al. Dorsal, ventral, and complete excitotoxic lesions of the hippocampus in rats failed to impair appetitive trace conditioning , 2007, Behavioural Brain Research.
[13] Peter L. Strick,et al. Macro-organization of the circuits connecting the basal ganglia with the cortical motor areas , 1995 .
[14] A. Dickinson,et al. Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.
[15] Louis D. Matzel,et al. The Role of the Hippocampus in Trace Conditioning: Temporal Discontinuity or Task Difficulty? , 2001, Neurobiology of Learning and Memory.
[16] James C. Houk,et al. Adaptive Critics and the Basal Ganglia , 1994 .
[17] David S. Touretzky,et al. Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.
[18] Kenji Doya,et al. Humans Can Adopt Optimal Discounting Strategy under Real-Time Constraints , 2006, PLoS Comput. Biol..
[19] W. Schultz,et al. Importance of unpredictability for reward responses in primate dopamine neurons. , 1994, Journal of neurophysiology.
[20] A. Barto. Adaptive Critics and the Basal Ganglia , 1995 .
[21] N. Daw,et al. Reinforcement learning models of the dopamine system and their behavioral implications , 2003 .
[22] W. Schultz,et al. Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.
[23] William B. Levy,et al. The formation of neural codes in the hippocampus: trace conditioning as a prototypical paradigm for studying the random recoding hypothesis , 2005, Biological Cybernetics.
[24] Mitsuo Kawato,et al. Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.
[25] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[26] K. Doya,et al. Representation of Action-Specific Reward Values in the Striatum , 2005, Science.
[27] D. Rubin,et al. The Precise Time Course of Retention , 1999 .
[28] John Odentrantz,et al. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.
[29] W K Bickel,et al. Polydrug abuse in heroin addicts: a behavioral economic analysis. , 1998, Addiction.
[30] W. Schultz,et al. A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.
[31] R. Wightman,et al. Coordinated Accumbal Dopamine Release and Neural Activity Drive Goal-Directed Behavior , 2007, Neuron.
[32] W. Schultz,et al. Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.
[33] Ann M. Graybiel,et al. Striosomes and Matrisomes , 1991 .
[34] W. Schultz. Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioural ecology , 2004, Current Opinion in Neurobiology.
[35] C. Gallistel,et al. Time, rate, and conditioning. , 2000, Psychological review.
[36] K. Doya. Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.
[37] Jadin C. Jackson,et al. Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. , 2007, Psychological review.
[38] D. Vere-Jones. Markov Chains , 1972, Nature.
[39] D. Read. Is Time-Discounting Hyperbolic or Subadditive? , 2001 .
[40] A. Kacelnik. Normative and descriptive models of decision making: time discounting and risk sensitivity. , 2007, Ciba Foundation symposium.
[41] Kenji Doya,et al. Multiple model-based reinforcement learning explains dopamine neuronal activity , 2007, Neural Networks.
[42] P. Glimcher,et al. Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.
[43] G. E. Alexander,et al. Parallel organization of functionally segregated circuits linking basal ganglia and cortex. , 1986, Annual review of neuroscience.
[44] Karl J. Friston,et al. Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.
[45] P Killeen,et al. The matching law. , 1972, Journal of the experimental analysis of behavior.
[46] A. Cooper,et al. Predictive Reward Signal of Dopamine Neurons , 2011 .
[47] G. E. Alexander,et al. Functional architecture of basal ganglia circuits: neural substrates of parallel processing , 1990, Trends in Neurosciences.
[48] Saori C. Tanaka,et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.
[49] T. Robbins,et al. Impulsive Choice Induced in Rats by Lesions of the Nucleus Accumbens Core , 2001, Science.
[50] David Laibson,et al. An economic perspective on addiction and matching , 1996, Behavioral and Brain Sciences.
[51] Amy L Odum,et al. Discounting of delayed health gains and losses by current, never- and ex-smokers of cigarettes. , 2002, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.
[52] W. Schultz,et al. Evidence that the delay-period activity of dopamine neurons corresponds to reward uncertainty rather than backpropagating TD errors , 2005, Behavioral and Brain Functions.
[53] Richard S. Sutton,et al. A computational model of hippocampal function in trace conditioning , 2008, NIPS.
[55] Asohan Amarasingham,et al. Internally Generated Cell Assembly Sequences in the Rat Hippocampus , 2008, Science.
[56] Asohan Amarasingham,et al. Hippocampus Internally Generated Cell Assembly Sequences in the Rat , 2011 .
[57] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[58] Peter Dayan,et al. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .
[59] F. Robert Jacobs,et al. Batch Construction Heuristics and Storage Assignment Strategies for Walk/Rideand Pick Systems , 1999 .
[60] R. Vuchinich,et al. Hyperbolic temporal discounting in social drinkers and problem drinkers. , 1998, Experimental and clinical psychopharmacology.
[61] S. Mitchell,et al. Measures of impulsivity in cigarette smokers and non-smokers , 1999, Psychopharmacology.
[62] David S. Touretzky,et al. Dopamine and inference about timing , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.
[63] George Ainslie,et al. A Marketplace in the Brain? , 2004, Science.
[64] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[65] George Ainslie,et al. Behavior. A marketplace in the brain? , 2004, Science.
[66] Jadin C. Jackson,et al. Detecting dynamical changes within a simulated neural ensemble using a measure of representational quality , 2003, Network.
[67] R. Wightman,et al. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens , 2007, Nature Neuroscience.
[68] Roland E. Suri,et al. Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.
[69] W. Schultz,et al. Responses of Monkey Dopamine Neurons to External Stimuli: Changes with Learning , 1991 .
[70] S. M. Alessi,et al. Pathological gambling severity is associated with impulsivity in a delay discounting procedure , 2003, Behavioural Processes.
[71] W. Schultz. Getting Formal with Dopamine and Reward , 2002, Neuron.
[72] D. Whitteridge. Lectures on Conditioned Reflexes , 1942, Nature.
[73] Z. Kurth-Nelson,et al. Neural Models of Temporal Discounting , 2009 .
[74] R. Rescorla. A theory of pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement , 1972 .
[75] Saori C. Tanaka,et al. Serotonin and the Evaluation of Future Rewards , 2007, Annals of the New York Academy of Sciences.
[76] W. F. Prokasy,et al. Classical conditioning II: Current research and theory. , 1972 .
[77] Kenji Doya,et al. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.
[78] N. Mackintosh. The psychology of animal learning , 1974 .
[79] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.
[80] J. Hollerman,et al. Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.
[81] Joel L. Davis,et al. In : Models of Information Processing in the Basal Ganglia , 2008 .
[82] Yael Niv,et al. OPERANT CONDITIONING , 1974, Scholarpedia.
[83] Sham M. Kakade,et al. Opponent interactions between serotonin and dopamine , 2002, Neural Networks.
[84] Peter Dayan,et al. Dopamine: generalization and bonuses , 2002, Neural Networks.
[85] Florentin Wörgötter,et al. Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms , 2005, Neural Computation.
[86] P. Kaplan,et al. Bridging temporal gaps between CS and US in autoshaping: A test of a local context hypothesis , 1984 .
[87] G. Bock,et al. Characterizing human psychological adaptations , 1997 .
[88] Jonathan D. Cohen,et al. Neuroeconomics: cross-currents in research on decision-making , 2006, Trends in Cognitive Sciences.
[89] Mitsuo Kawato,et al. Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.
[90] W. Newsome,et al. The temporal precision of reward prediction in dopamine neurons , 2008, Nature Neuroscience.
[91] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[92] W. Pan,et al. Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.
[93] Zeb Kurth-Nelson,et al. Neural models of delay discounting. , 2010 .
[94] W B Levy,et al. A sequence predicting CA3 is a flexible associator that learns and uses context to solve hippocampal‐like tasks , 1996, Hippocampus.
[95] R. F. Thompson,et al. Hippocampus and trace conditioning of the rabbit's classically conditioned nictitating membrane response. , 1986, Behavioral neuroscience.
[96] G. Ainslie. Breakdown of will , 2001 .
[97] R. Church,et al. Scalar expectancy theory and choice between delayed rewards. , 1988, Psychological review.
[98] Nikolaus R. McFarland,et al. Striatonigrostriatal Pathways in Primates Form an Ascending Spiral from the Shell to the Dorsolateral Striatum , 2000, The Journal of Neuroscience.
[99] Joel L. Davis,et al. Macro-organization of the Circuits Connecting the Basal Ganglia with the Cortical Motor Areas , 1994 .
[100] Richard Bellman,et al. ON A ROUTING PROBLEM , 1958 .
[101] Samuel M. McClure,et al. Separate Neural Systems Value Immediate and Delayed Monetary Rewards , 2004, Science.
[102] E. Bullmore,et al. Society for Neuroscience Abstracts , 1997 .
[103] W. Schultz,et al. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli , 1996, Nature.
[104] R. Wightman,et al. Extinction of Cocaine Self-Administration Reveals Functionally and Temporally Distinct Dopaminergic Signals in the Nucleus Accumbens , 2005, Neuron.
[105] Richard S. Sutton,et al. Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.
[106] K. Doya. Metalearning, neuromodulation, and emotion , 2000 .
[107] David Self. Neurobiology: Dopamine as chicken and egg , 2003, Nature.
[108] S. Mahadevan,et al. Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .
[109] Peter Dayan,et al. Motivated Reinforcement Learning , 2001, NIPS.
[110] A. Dickinson,et al. Reward-related signals carried by dopamine neurons. , 1995 .
[111] Alexandre Pouget,et al. Probabilistic Interpretation of Population Codes , 1996, Neural Computation.
[112] R. Wightman,et al. Subsecond dopamine release promotes cocaine seeking , 2003, Nature.
[113] J. O'Doherty,et al. Reward representations and reward-related learning in the human brain: insights from neuroimaging , 2004, Current Opinion in Neurobiology.
[114] R. Wightman,et al. Dopamine Operates as a Subsecond Modulator of Food Seeking , 2004, The Journal of Neuroscience.
[115] Saori C. Tanaka,et al. Low-Serotonin Levels Increase Delayed Reward Discounting in Humans , 2008, The Journal of Neuroscience.
[116] D. Rubin,et al. One Hundred Years of Forgetting : A Quantitative Description of Retention , 1996 .
[117] P. Dayan,et al. Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.
[118] B. Reynolds. A review of delay-discounting research with humans: relations to drug use and gambling , 2006, Behavioural pharmacology.
[119] A. David Redish,et al. Measuring distributed properties of neural representations beyond the decoding of local variables: Implications for cognition , 2008 .
[120] C. Pennartz,et al. Is a bird in the hand worth two in the future? The neuroeconomics of intertemporal decision-making , 2008, Progress in Neurobiology.