A Review of the Relationship between Novelty, Intrinsic Motivation and Reinforcement Learning

Abstract This paper presents a review on the tri-partite relationship between novelty, intrinsic motivation and reinforcement learning. The paper first presents a literature survey on novelty and the different computational models of novelty detection, with a specific focus on the features of stimuli that trigger a Hedonic value for generating a novelty signal. It then presents an overview of intrinsic motivation and investigations into different models with the aim of exploring deeper co-relationships between specific features of a novelty signal and its effect on intrinsic motivation in producing a reward function. Finally, it presents survey results on reinforcement learning, different models and their functional relationship with intrinsic motivation.

[1]  E. J. Tehovnik,et al.  Phosphene induction and the generation of saccadic eye movements by striate cortex. , 2005, Journal of neurophysiology.

[2]  L. Festinger,et al.  A Theory of Cognitive Dissonance , 2017 .

[3]  Kathryn E. Merrick,et al.  Motivated Learning from Interesting Events: Adaptive, Multitask Learning Agents for Complex Environments , 2009, Adapt. Behav..

[4]  Douglas S. Blank,et al.  An Emergent Framework For Self-Motivation In Developmental Robotics , 2004 .

[5]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[6]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[7]  Jürgen Schmidhuber,et al.  Sequential neural text compression , 1996, IEEE Trans. Neural Networks.

[8]  P. Groves,et al.  Habituation: a dual-process theory. , 1970, Psychological review.

[9]  K. Spence The role of secondary reinforcement in delayed reward learning. , 1947 .

[10]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[11]  J. Kagan Motives and development. , 1972, Journal of personality and social psychology.

[12]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[13]  B. Hayden,et al.  The Psychology and Neuroscience of Curiosity , 2015, Neuron.

[14]  D. Whitteridge Lectures on Conditioned Reflexes , 1942, Nature.

[15]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[16]  G. Loewenstein The psychology of curiosity: A review and reinterpretation. , 1994 .

[17]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[18]  J. Urgen Schmidhuber,et al.  Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.

[19]  Michael A. Arbib,et al.  Modeling the dishabituation hierarchy: The role of the primordial hippocampus , 1992, Biological Cybernetics.

[20]  Hugo Vieira Neto,et al.  Visual novelty detection with automatic scale selection , 2007, Robotics Auton. Syst..

[21]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[22]  E. Murray,et al.  The amygdala and reward , 2002, Nature Reviews Neuroscience.

[23]  Vijaykumar Gullapalli,et al.  Reinforcement learning and its application to control , 1992 .

[24]  John S. Gero,et al.  Curious agents and situated design evaluations , 2004, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[25]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[26]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[27]  C. L. Hull Principles of behavior : an introduction to behavior theory , 1943 .

[28]  S. Marsland Novelty Detection in Learning Systems , 2008 .

[29]  Hugo Vieira Neto,et al.  Automated Exploration and Inspection: Comparing Two Visual Novelty Detectors , 2005 .

[30]  Kae Nakamura,et al.  Predictive Reward Signal of Dopamine Neurons , 2015 .

[31]  Eilon Vaadia,et al.  Neural basis of sensorimotor learning: modifying internal models , 2008, Current Opinion in Neurobiology.

[32]  Zeb Kurth-Nelson,et al.  The modulation of savouring by prediction error and its effects on choice , 2016, eLife.

[33]  Pierre-Yves Oudeyer,et al.  Discovering communication , 2006, Connect. Sci..

[34]  Hugo Vieira Neto,et al.  Real-time Automated Visual Inspection using Mobile Robots , 2007, J. Intell. Robotic Syst..

[35]  Michael A. Hunter,et al.  Effects of stimulus complexity and familiarization time on infant preferences for novel and familiar stimuli. , 1983 .

[36]  Gianluca Baldassarre,et al.  What are intrinsic motivations? A biological perspective , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[37]  Edward L. Deci,et al.  Intrinsic Motivation and Self-Determination in Human Behavior , 1975, Perspectives in Social Psychology.

[38]  E. D. Rosal,et al.  Simulation of habituation to simple and multiple stimuli , 2006, Behavioural Processes.

[39]  Xiaoqin Wang,et al.  Information content of auditory cortical responses to time-varying acoustic stimuli. , 2004, Journal of neurophysiology.

[40]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[41]  W. Schultz,et al.  Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[42]  Stephen R. Marsland,et al.  On-line novelty detection for autonomous mobile robots , 2005, Robotics Auton. Syst..

[43]  D. Wolpert,et al.  Computations underlying sensorimotor learning , 2016, Current Opinion in Neurobiology.

[44]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[45]  Harlow Hf Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950 .

[46]  A. Barto,et al.  Intrinsic Motivation For Reinforcement Learning Systems , 2005 .

[47]  Andrew McCallum,et al.  Toward Optimal Active Learning through Monte Carlo Estimation of Error Reduction , 2001, ICML 2001.

[48]  Denis Mareschal,et al.  Models of habituation in infancy , 2002, Trends in Cognitive Sciences.

[49]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[50]  DeLiang Wang,et al.  A Neural Model of Synaptic Plasticity Underlying Short-term and Long-term Habituation , 1993, Adapt. Behav..

[51]  Gordon Cheng,et al.  Yielding Self-Perception in Robots Through Sensorimotor Contingencies , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[52]  VARUN CHANDOLA,et al.  Outlier Detection : A Survey , 2007 .

[53]  W. N. Dember,et al.  Analysis of exploratory, manipulatory, and curiosity behaviors. , 1957, Psychological review.

[54]  J. C. Stanley Computer simulation of a model of habituation , 1976, Nature.

[55]  M. Velasco,et al.  Effects of novelty, habituation, attention and distraction on the amplitude of the various components of the somatic evoked responses. , 1973, The International journal of neuroscience.

[56]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[57]  Jürgen Schmidhuber,et al.  Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes , 2008, ABiALS.

[58]  W. Schultz Multiple reward signals in the brain , 2000, Nature Reviews Neuroscience.

[59]  D. Wolpert,et al.  Principles of sensorimotor learning , 2011, Nature Reviews Neuroscience.

[60]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[61]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[62]  C. Hutt DEGREES OF NOVELTY AND THEIR EFFECTS ON CHILDREN'S ATTENTION AND PREFERENCE , 1975 .

[63]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[64]  Ethan S. Bromberg-Martin,et al.  Lateral habenula neurons signal errors in the prediction of reward information , 2011, Nature Neuroscience.

[65]  R. F. Thompson,et al.  Habituation: a model phenomenon for the study of neuronal substrates of behavior. , 1966, Psychological review.

[66]  Nicole Fruehauf Flow The Psychology Of Optimal Experience , 2016 .

[67]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[68]  D. Hebb Drives and the C.N.S. (conceptual nervous system). , 1955, Psychological review.

[69]  J. O'Doherty,et al.  Human Neural Learning Depends on Reward Prediction Errors in the Blocking Paradigm , 2005, Journal of neurophysiology.

[70]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[71]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Machines , 2006, 50 Years of Artificial Intelligence.

[72]  DeLiang Wang,et al.  SLONN: A Simulation Language for modeling of Neural Networks , 1990, Simul..

[73]  G. Baldassarre,et al.  Functions and Mechanisms of Intrinsic Motivations The Knowledge Versus Competence Distinction , 2012 .

[74]  Pierre-Yves Oudeyer,et al.  Motivational principles for visual know-how development , 2003 .

[75]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[76]  R. W. White Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[77]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[78]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[79]  Hugo Vieira Neto,et al.  Incremental PCA: an alternative approach for novelty detection , 2005 .

[80]  E. Deci,et al.  Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[81]  E. N. Sokolov Higher nervous functions; the orienting reflex. , 1963, Annual review of physiology.

[82]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[83]  Xiao Huang,et al.  Novelty and Reinforcement Learning in the Value System of Developmental Robots , 2002 .

[84]  Jorge Dias,et al.  Attentional Mechanisms for Socially Interactive Robots–A Survey , 2014, IEEE Transactions on Autonomous Mental Development.

[85]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[86]  A. Lazaric,et al.  Self-Development Framework for Reinforcement Learning Agents , 2006 .

[87]  D. A. Baxter,et al.  Operant Reward Learning in Aplysia: Neuronal Correlates and Mechanisms , 2002, Science.

[88]  Pierre-Yves Oudeyer,et al.  The Playground Experiment: Task-Independent Development of a Curious Robot , 2005 .

[89]  M. Lopes,et al.  Intrinsically motivated oculomotor exploration guided by uncertainty reduction and conditioned reinforcement in non-human primates , 2016, Scientific Reports.

[90]  G. Baldassarre,et al.  Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[91]  John C. Barber THE EFFECTS OF NOVELTY , 2011 .

[92]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[93]  G. Brüning,et al.  Histochemistry of nadph-diaphorase, a marker for neuronal nitric oxide synthase, in the peripheral autonomic nervous system of the mouse , 1992, Neuroscience.

[94]  Pierre-Yves Oudeyer,et al.  The progress drive hypothesis: an interpretation of early imitation , 2007 .

[95]  Pierre-Yves Oudeyer,et al.  In Search of the Neural Circuits of Intrinsic Motivation , 2007, Front. Neurosci..

[96]  E. Deci Cognitive Evaluation Theory: Effects of Extrinsic Rewards on Intrinsic Motivation , 1975 .

[97]  D. Berlyne Curiosity and exploration. , 1966, Science.

[98]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[99]  Juyang Weng,et al.  Motivational System for Human-Robot Interaction , 2004, ECCV Workshop on HCI.