论文信息 - A Review of the Relationship between Novelty, Intrinsic Motivation and Reinforcement Learning

A Review of the Relationship between Novelty, Intrinsic Motivation and Reinforcement Learning

Abstract This paper presents a review on the tri-partite relationship between novelty, intrinsic motivation and reinforcement learning. The paper first presents a literature survey on novelty and the different computational models of novelty detection, with a specific focus on the features of stimuli that trigger a Hedonic value for generating a novelty signal. It then presents an overview of intrinsic motivation and investigations into different models with the aim of exploring deeper co-relationships between specific features of a novelty signal and its effect on intrinsic motivation in producing a reward function. Finally, it presents survey results on reinforcement learning, different models and their functional relationship with intrinsic motivation.

[1] E. J. Tehovnik,et al. Phosphene induction and the generation of saccadic eye movements by striate cortex. , 2005, Journal of neurophysiology.

[2] L. Festinger,et al. A Theory of Cognitive Dissonance , 2017 .

[3] Kathryn E. Merrick,et al. Motivated Learning from Interesting Events: Adaptive, Multitask Learning Agents for Complex Environments , 2009, Adapt. Behav..

[4] Douglas S. Blank,et al. An Emergent Framework For Self-Motivation In Developmental Robotics , 2004 .

[5] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[6] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[7] Jürgen Schmidhuber,et al. Sequential neural text compression , 1996, IEEE Trans. Neural Networks.

[8] P. Groves,et al. Habituation: a dual-process theory. , 1970, Psychological review.

[9] K. Spence. The role of secondary reinforcement in delayed reward learning. , 1947 .

[10] W. Schultz. Getting Formal with Dopamine and Reward , 2002, Neuron.

[11] J. Kagan. Motives and development. , 1972, Journal of personality and social psychology.

[12] W. J. Studden,et al. Theory Of Optimal Experiments , 1972 .

[13] B. Hayden,et al. The Psychology and Neuroscience of Curiosity , 2015, Neuron.

[14] D. Whitteridge. Lectures on Conditioned Reflexes , 1942, Nature.

[15] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[16] G. Loewenstein. The psychology of curiosity: A review and reinterpretation. , 1994 .

[17] Victoria J. Hodge,et al. A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[18] J. Urgen Schmidhuber,et al. Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.

[19] Michael A. Arbib,et al. Modeling the dishabituation hierarchy: The role of the primordial hippocampus , 1992, Biological Cybernetics.

[20] Hugo Vieira Neto,et al. Visual novelty detection with automatic scale selection , 2007, Robotics Auton. Syst..

[21] W. Schultz,et al. Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[22] E. Murray,et al. The amygdala and reward , 2002, Nature Reviews Neuroscience.

[23] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .

[24] John S. Gero,et al. Curious agents and situated design evaluations , 2004, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[25] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[26] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.

[27] C. L. Hull. Principles of behavior : an introduction to behavior theory , 1943 .

[28] S. Marsland. Novelty Detection in Learning Systems , 2008 .

[29] Hugo Vieira Neto,et al. Automated Exploration and Inspection: Comparing Two Visual Novelty Detectors , 2005 .

[30] Kae Nakamura,et al. Predictive Reward Signal of Dopamine Neurons , 2015 .

[31] Eilon Vaadia,et al. Neural basis of sensorimotor learning: modifying internal models , 2008, Current Opinion in Neurobiology.

[32] Zeb Kurth-Nelson,et al. The modulation of savouring by prediction error and its effects on choice , 2016, eLife.

[33] Pierre-Yves Oudeyer,et al. Discovering communication , 2006, Connect. Sci..

[34] Hugo Vieira Neto,et al. Real-time Automated Visual Inspection using Mobile Robots , 2007, J. Intell. Robotic Syst..

[35] Michael A. Hunter,et al. Effects of stimulus complexity and familiarization time on infant preferences for novel and familiar stimuli. , 1983 .

[36] Gianluca Baldassarre,et al. What are intrinsic motivations? A biological perspective , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[37] Edward L. Deci,et al. Intrinsic Motivation and Self-Determination in Human Behavior , 1975, Perspectives in Social Psychology.

[38] E. D. Rosal,et al. Simulation of habituation to simple and multiple stimuli , 2006, Behavioural Processes.

[39] Xiaoqin Wang,et al. Information content of auditory cortical responses to time-varying acoustic stimuli. , 2004, Journal of neurophysiology.

[40] J. Hollerman,et al. Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[41] W. Schultz,et al. Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[42] Stephen R. Marsland,et al. On-line novelty detection for autonomous mobile robots , 2005, Robotics Auton. Syst..

[43] D. Wolpert,et al. Computations underlying sensorimotor learning , 2016, Current Opinion in Neurobiology.

[44] F. E. Grubbs. Procedures for Detecting Outlying Observations in Samples , 1969 .

[45] Harlow Hf. Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950 .

[46] A. Barto,et al. Intrinsic Motivation For Reinforcement Learning Systems , 2005 .

[47] Andrew McCallum,et al. Toward Optimal Active Learning through Monte Carlo Estimation of Error Reduction , 2001, ICML 2001.

[48] Denis Mareschal,et al. Models of habituation in infancy , 2002, Trends in Cognitive Sciences.

[49] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .

[50] DeLiang Wang,et al. A Neural Model of Synaptic Plasticity Underlying Short-term and Long-term Habituation , 1993, Adapt. Behav..

[51] Gordon Cheng,et al. Yielding Self-Perception in Robots Through Sensorimotor Contingencies , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[52] VARUN CHANDOLA,et al. Outlier Detection : A Survey , 2007 .

[53] W. N. Dember,et al. Analysis of exploratory, manipulatory, and curiosity behaviors. , 1957, Psychological review.

[54] J. C. Stanley. Computer simulation of a model of habituation , 1976, Nature.

[55] M. Velasco,et al. Effects of novelty, habituation, attention and distraction on the amplitude of the various components of the somatic evoked responses. , 1973, The International journal of neuroscience.

[56] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[57] Jürgen Schmidhuber,et al. Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes , 2008, ABiALS.

[58] W. Schultz. Multiple reward signals in the brain , 2000, Nature Reviews Neuroscience.

[59] D. Wolpert,et al. Principles of sensorimotor learning , 2011, Nature Reviews Neuroscience.

[60] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[61] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[62] C. Hutt. DEGREES OF NOVELTY AND THEIR EFFECTS ON CHILDREN'S ATTENTION AND PREFERENCE , 1975 .

[63] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[64] Ethan S. Bromberg-Martin,et al. Lateral habenula neurons signal errors in the prediction of reward information , 2011, Nature Neuroscience.

[65] R. F. Thompson,et al. Habituation: a model phenomenon for the study of neuronal substrates of behavior. , 1966, Psychological review.

[66] Nicole Fruehauf. Flow The Psychology Of Optimal Experience , 2016 .

[67] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[68] D. Hebb. Drives and the C.N.S. (conceptual nervous system). , 1955, Psychological review.

[69] J. O'Doherty,et al. Human Neural Learning Depends on Reward Prediction Errors in the Blocking Paradigm , 2005, Journal of neurophysiology.

[70] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[71] Pierre-Yves Oudeyer,et al. Intrinsically Motivated Machines , 2006, 50 Years of Artificial Intelligence.

[72] DeLiang Wang,et al. SLONN: A Simulation Language for modeling of Neural Networks , 1990, Simul..

[73] G. Baldassarre,et al. Functions and Mechanisms of Intrinsic Motivations The Knowledge Versus Competence Distinction , 2012 .

[74] Pierre-Yves Oudeyer,et al. Motivational principles for visual know-how development , 2003 .

[75] VARUN CHANDOLA,et al. Anomaly detection: A survey , 2009, CSUR.

[76] R. W. White. Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[77] Sameer Singh,et al. Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[78] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[79] Hugo Vieira Neto,et al. Incremental PCA: an alternative approach for novelty detection , 2005 .

[80] E. Deci,et al. Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[81] E. N. Sokolov. Higher nervous functions; the orienting reflex. , 1963, Annual review of physiology.

[82] Sameer Singh,et al. Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[83] Xiao Huang,et al. Novelty and Reinforcement Learning in the Value System of Developmental Robots , 2002 .

[84] Jorge Dias,et al. Attentional Mechanisms for Socially Interactive Robots–A Survey , 2014, IEEE Transactions on Autonomous Mental Development.

[85] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[86] A. Lazaric,et al. Self-Development Framework for Reinforcement Learning Agents , 2006 .

[87] D. A. Baxter,et al. Operant Reward Learning in Aplysia: Neuronal Correlates and Mechanisms , 2002, Science.

[88] Pierre-Yves Oudeyer,et al. The Playground Experiment: Task-Independent Development of a Curious Robot , 2005 .

[89] M. Lopes,et al. Intrinsically motivated oculomotor exploration guided by uncertainty reduction and conditioned reinforcement in non-human primates , 2016, Scientific Reports.

[90] G. Baldassarre,et al. Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[91] John C. Barber. THE EFFECTS OF NOVELTY , 2011 .

[92] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[93] G. Brüning,et al. Histochemistry of nadph-diaphorase, a marker for neuronal nitric oxide synthase, in the peripheral autonomic nervous system of the mouse , 1992, Neuroscience.

[94] Pierre-Yves Oudeyer,et al. The progress drive hypothesis: an interpretation of early imitation , 2007 .

[95] Pierre-Yves Oudeyer,et al. In Search of the Neural Circuits of Intrinsic Motivation , 2007, Front. Neurosci..

[96] E. Deci. Cognitive Evaluation Theory: Effects of Extrinsic Rewards on Intrinsic Motivation , 1975 .

[97] D. Berlyne. Curiosity and exploration. , 1966, Science.

[98] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[99] Juyang Weng,et al. Motivational System for Human-Robot Interaction , 2004, ECCV Workshop on HCI.