Finding minimal action sequences with a simple evaluation of actions
暂无分享,去创建一个
[1] J. Horvitz. Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events , 2000, Neuroscience.
[2] Daeyeol Lee,et al. Beyond working memory: the role of persistent activity in decision making , 2010, Trends in Cognitive Sciences.
[3] Joel L. Davis,et al. A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .
[4] Wg Lehnert,et al. THE HEDONISTIC NEURON - A THEORY OF MEMORY, LEARNING, AND INTELLIGENCE - KLOPF,AH , 1983 .
[5] Jennie Si,et al. Supervised ActorCritic Reinforcement Learning , 2004 .
[6] A. Barto,et al. Novelty or Surprise? , 2013, Front. Psychol..
[7] Kevin Gurney,et al. Action Discovery and Intrinsic Motivation: A Biologically Constrained Formalisation , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.
[8] Kevin Gurney,et al. A Novel Task for the Investigation of Action Acquisition , 2012, PloS one.
[9] Joel Myerson,et al. Exponential Versus Hyperbolic Discounting of Delayed Outcomes: Risk and Waiting Time , 1996 .
[10] R. Thaler. Some empirical evidence on dynamic inconsistency , 1981 .
[11] Christos Dimitrakakis,et al. Computational and Robotic Models of the Hierarchical Organization of Behavior , 2012 .
[12] Ashvin Shah,et al. A computational model of muscle recruitment for wrist movements. , 2002, Journal of neurophysiology.
[13] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.
[14] SinghSatinder,et al. Between MDPs and semi-MDPs , 1999 .
[15] W. Pan,et al. Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.
[16] Ashvin Shah,et al. A Dual Process Account of Coarticulation in Motor Skill Acquisition , 2013, Journal of motor behavior.
[17] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.
[18] Michael I. Jordan,et al. Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.
[19] P. Redgrave,et al. Functional properties of the basal ganglia's re-entrant loop architecture: selection and reinforcement , 2011, Neuroscience.
[20] John S. Edwards,et al. The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .
[21] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[22] Kevin N. Gurney,et al. A biologically plausible embodied model of action discovery , 2012, Front. Neurorobot..
[23] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[24] R. H. Strotz. Myopia and Inconsistency in Dynamic Utility Maximization , 1955 .
[25] Stephen Hart,et al. The development of hierarchical knowledge in robot systems , 2009 .
[26] John M. Ennis,et al. A neurobiological theory of automaticity in perceptual categorization. , 2007, Psychological review.
[27] Wulfram Gerstner,et al. Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail , 2009, PLoS Comput. Biol..
[28] E. Miller,et al. Different time courses of learning-related activity in the prefrontal cortex and striatum , 2005, Nature.
[29] B. Balleine,et al. Reward‐guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico‐basal ganglia networks , 2008, The European journal of neuroscience.
[30] E. Kandel,et al. Cognitive Neuroscience and the Study of Memory , 1998, Neuron.
[31] G. Bi,et al. Synaptic modification by correlated activity: Hebb's postulate revisited. , 2001, Annual review of neuroscience.
[32] Takemi Otsuki,et al. Functional Properties of CD8+ Lymphocytes in Patients with Pleural Plaque and Malignant Mesothelioma , 2014, Journal of immunology research.
[33] A. Hendrickson,et al. Human photoreceptor topography , 1990, The Journal of comparative neurology.
[34] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.
[35] Sridhar Mahadevan,et al. Basis function construction for hierarchical reinforcement learning , 2010, AAMAS.
[36] George Konidaris,et al. Autonomous Robot Skill Acquisition , 2008, AAAI.
[37] J. W. Aldridge,et al. Dissecting components of reward: 'liking', 'wanting', and learning. , 2009, Current opinion in pharmacology.
[38] P. Goldman-Rakic. Cellular basis of working memory , 1995, Neuron.
[39] S. H. Chung,et al. Effects of delayed reinforcement in a concurrent situation. , 1965, Journal of the experimental analysis of behavior.
[40] I. Pavlov. Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex , 1929 .
[41] T. Poggio,et al. Nonlinear interactions in a dendritic tree: localization, timing, and role in information processing. , 1983, Proceedings of the National Academy of Sciences of the United States of America.
[42] P. Dayan,et al. States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.
[43] W. Schultz,et al. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[44] H. Markram,et al. Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997, Science.
[45] Mitsuo Kawato,et al. Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning , 2006, Neural Networks.
[46] Ashvin Shah,et al. Psychological and Neuroscientific Connections with Reinforcement Learning , 2012, Reinforcement Learning.
[47] K. Berridge,et al. What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? , 1998, Brain Research Reviews.
[48] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.
[49] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[50] D. Norman. Learning and Memory , 1982 .
[51] K. Doya,et al. Multiple Representations of Belief States and Action Values in Corticobasal Ganglia Loops , 2007, Annals of the New York Academy of Sciences.
[52] B. Balleine,et al. Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.
[53] Kenji Doya,et al. Combining Modalities with Different Latencies for Optimal Motor Control , 2008, Journal of Cognitive Neuroscience.
[54] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[55] A. Barto,et al. Effect on movement selection of an evolving sensory representation: A multiple controller model of skill acquisition , 2009, Brain Research.
[56] S. Scott,et al. Nonuniform distribution of reach-related and torque-related activity in upper arm muscles and neurons of primary motor cortex. , 2006, Journal of neurophysiology.
[57] A. Barto,et al. Cortical involvement in the recruitment of wrist muscles. , 2004, Journal of neurophysiology.
[58] Sridhar Mahadevan,et al. Representation Discovery in Sequential Decision Making , 2010, AAAI.
[59] Karl J. Friston,et al. Active inference and agency: optimal control without cost functions , 2012, Biological Cybernetics.
[60] Andrew G. Barto,et al. Behavioral Hierarchy: Exploration and Representation , 2013, Computational and Robotic Models of the Hierarchical Organization of Behavior.
[61] A. Dickinson. Actions and habits: the development of behavioural autonomy , 1985 .
[62] Mitsuo Kawato,et al. Feedback-Error-Learning Neural Network for Supervised Motor Learning , 1990 .
[63] F A LOGAN,et al. DECISION MAKING BY RATS: DELAY VERSUS AMOUNT OF REWARD. , 1965, Journal of comparative and physiological psychology.
[64] T. SHALLICE,et al. Learning and Memory , 1970, Nature.
[65] Thomas J. Wills,et al. The development of spatial behaviour and the hippocampal neural representation of space , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.
[66] Kevin Gurney,et al. Dopamine-mediated action discovery promotes optimal behavior ‘for free’ , 2011, BMC Neuroscience.
[67] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[68] W. Schultz,et al. Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.
[69] W. Schultz. Updating dopamine reward signals , 2013, Current Opinion in Neurobiology.
[70] P. Samuelson. A Note on Measurement of Utility , 1937 .
[71] L. Green,et al. A discounting framework for choice with delayed and probabilistic rewards. , 2004, Psychological bulletin.
[72] Kevin W. Bowyer,et al. The Functional Properties , 1996 .
[73] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[74] Kevin Gurney,et al. The Role of the Basal Ganglia in Discovering Novel Actions , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.
[75] I. Pavlov,et al. Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex , 2010, Annals of Neurosciences.
[76] Zeb Kurth-Nelson,et al. Temporal-Difference Reinforcement Learning with Distributed Representations , 2009, PloS one.
[77] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[78] B. Balleine,et al. The integrative function of the basal ganglia in instrumental conditioning , 2009, Behavioural Brain Research.
[79] J. Pearce. Animal Learning and Cognition: An Introduction , 1997 .
[80] John H. R. Maunsell,et al. The visual field representation in striate cortex of the macaque monkey: Asymmetries, anisotropies, and individual variability , 1984, Vision Research.
[81] T. Lillicrap,et al. Preference Distributions of Primary Motor Cortex Neurons Reflect Control Solutions Optimized for Limb Biomechanics , 2013, Neuron.
[82] J. Wickens,et al. Neural mechanisms of reward-related motor learning , 2003, Current Opinion in Neurobiology.
[83] Andrew G. Barto,et al. Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.
[84] P. Redgrave,et al. What is reinforced by phasic dopamine signals? , 2008, Brain Research Reviews.
[85] Balaraman Ravindran,et al. SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.
[86] M. London,et al. Dendritic computation. , 2005, Annual review of neuroscience.
[87] Yael Niv,et al. Operant Conditioning , 1971 .
[88] Emilio Kropff,et al. Place cells, grid cells, and the brain's spatial representation system. , 2008, Annual review of neuroscience.
[89] Peter Stone,et al. Reinforcement learning from human reward: Discounting in episodic tasks , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.
[90] Nuttapong Chentanez,et al. Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .
[91] B. Knowlton,et al. Learning and memory functions of the Basal Ganglia. , 2002, Annual review of neuroscience.
[92] E. Izhikevich. Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.
[93] Michael T. Rosenstein,et al. Supervised Actor‐Critic Reinforcement Learning , 2012 .
[94] Matthew A. Wilson,et al. Neural Representation of Spatial Topology in the Rodent Hippocampus , 2013, Neural Computation.
[95] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[96] Y. Niv. Reinforcement learning in the brain , 2009 .
[97] H. Bergman,et al. Goal-directed and habitual control in the basal ganglia: implications for Parkinson's disease , 2010, Nature Reviews Neuroscience.
[98] B. Roche,et al. The Behavior of Organisms? , 1997 .
[99] Florentin Wörgötter,et al. Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms , 2005, Neural Computation.
[100] W. Brown. Animal Intelligence: Experimental Studies , 1912, Nature.
[101] Ashvin Shah. Biologically-based functional mechanisms of motor skill acquisition , 2008 .
[102] W. Schultz,et al. Learning of sequential movements by neural network model with dopamine-like reinforcement signal , 1998, Experimental Brain Research.
[103] L. Green,et al. Discounting of delayed rewards: Models of individual choice. , 1995, Journal of the experimental analysis of behavior.
[104] K. Berridge. The debate over dopamine’s role in reward: the case for incentive salience , 2007, Psychopharmacology.
[105] E. Thorndike. Animal intelligence; experimental studies, by Edward L. Thorndike. , 1911 .
[106] Bartlett W. Mel,et al. Translation-Invariant Orientation Tuning in Visual “Complex” Cells Could Derive from Intradendritic Computations , 1998, The Journal of Neuroscience.
[107] S. Scott. Inconvenient Truths about neural processing in primary motor cortex , 2008, The Journal of physiology.
[108] Benjamin O. Turner,et al. Cortical and basal ganglia contributions to habit learning and automaticity , 2010, Trends in Cognitive Sciences.
[109] Giovanni Pezzulo,et al. A spiking neuron model of the cortico-basal ganglia circuits for goal-directed and habitual action learning. , 2013, Neural networks : the official journal of the International Neural Network Society.
[110] Antonio Pedotti,et al. Optimization of muscle-force sequencing in human locomotion , 1978 .
[111] S. Ostlund,et al. Phasic Mesolimbic Dopamine Signaling Precedes and Predicts Performance of a Self-Initiated Action Sequence Task , 2012, Biological Psychiatry.
[112] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .