论文信息 - Learning and control of exploration primitives

Learning and control of exploration primitives

Animals explore novel environments in a cautious manner, exhibiting alternation between curiosity-driven behavior and retreats. We present a detailed formal framework for exploration behavior, which generates behavior that maintains a constant level of novelty. Similar to other types of complex behaviors, the resulting exploratory behavior is composed of exploration motor primitives. These primitives can be learned during a developmental period, wherein the agent experiences repeated interactions with environments that share common traits, thus allowing transference of motor learning to novel environments. The emergence of exploration motor primitives is the result of reinforcement learning in which information gain serves as intrinsic reward. Furthermore, actors and critics are local and ego-centric, thus enabling transference to other environments. Novelty control, i.e. the principle which governs the maintenance of constant novelty, is implemented by a central action-selection mechanism, which switches between the emergent exploration primitives and a retreat policy, based on the currently-experienced novelty. The framework has only a few parameters, wherein time-scales, learning rates and thresholds are adaptive, and can thus be easily applied to many scenarios. We implement it by modeling the rodent’s whisking system and show that it can explain characteristic observed behaviors. A detailed discussion of the framework’s merits and flaws, as compared to other related models, concludes the paper.

[1] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[2] Marc Cigrang,et al. Does neophobia necessarily imply fear or anxiety? , 1986, Behavioural Processes.

[3] Rune W. Berg,et al. Rhythmic whisking by rat: retraction as well as protraction of the vibrissae is under active muscular control. , 2003, Journal of neurophysiology.

[4] M. Hartmann,et al. Right–Left Asymmetries in the Whisking Behavior of Rats Anticipate Head Movements , 2006, The Journal of Neuroscience.

[5] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.

[6] Nuttapong Chentanez,et al. Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[7] Kevin N. Gurney,et al. A Causal Bayesian Network View of Reinforcement Learning , 2008, FLAIRS.

[8] E. Ahissar,et al. Vibrissal Kinematics in 3D: Tight Coupling of Azimuth, Elevation, and Torsion across Different Whisking Modes , 2008, Neuron.

[9] Madan M. Gupta,et al. An adaptive switching learning control method for trajectory tracking of robot manipulators , 2006 .

[10] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.

[11] Ben Mitchinson,et al. Feedback control in active sensing: rat exploratory whisking is modulated by environmental contact , 2007, Proceedings of the Royal Society B: Biological Sciences.

[12] N. Tinbergen,et al. The Study of Instinct , 1953 .

[13] Eilon Vaadia,et al. Neural basis of sensorimotor learning: modifying internal models , 2008, Current Opinion in Neurobiology.

[14] Peter Redgrave,et al. Basal Ganglia , 2020, Encyclopedia of Autism Spectrum Disorders.

[15] K. Nomizu. Affine Differential Geometry , 1994 .

[16] J. Krakauer,et al. A computational neuroanatomy for motor control , 2008, Experimental Brain Research.

[17] P. Dayan,et al. Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[18] Per Magne Knutsen,et al. Object localization with whiskers , 2008, Biological Cybernetics.

[19] Robert N. Hughes,et al. Neotic preferences in laboratory rodents: Issues, assessment and substrates , 2007, Neuroscience & Biobehavioral Reviews.

[20] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[21] Barry R. Komisaruk,et al. Synchrony among rhythmical facial tremor, neocortical ‘ALPHA’ waves, and thalamic non-sensory neuronal bursts in intact awake rats , 1980, Brain Research.

[22] Ehud Ahissar,et al. Thalamic relay or cortico-thalamic processing? Old question, new answers. , 2015, Cerebral cortex.

[23] Su Buqing,et al. Affine differential geometry , 1983 .

[24] Mitsuo Kawato,et al. Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[25] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[26] S. Kitazawa,et al. Bayesian calibration of simultaneity in tactile temporal order judgment , 2006, Nature Neuroscience.

[27] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[28] Santanu Chaudhury,et al. Self-organizing neural networks for learning inverse dynamics of robot manipulator , 1995, Proceedings of IEEE/IAS International Conference on Industrial Automation and Control.

[29] S. A. Barnett,et al. Exploratory behaviour. , 1958, British journal of psychology.

[30] Yoav Benjamini,et al. Freedom of movement and the stability of its unfolding in free exploration of mice , 2009, Proceedings of the National Academy of Sciences.

[31] Ehud Ahissar,et al. Temporal-Code to Rate-Code Conversion by Neuronal Phase-Locked Loops , 1998, Neural Computation.

[32] Daniel Polani,et al. Information Theory of Decisions and Actions , 2011 .

[33] T. Prescott,et al. Active touch sensing in the rat: anticipatory and regulatory control of whisker movements during surface exploration. , 2009, Journal of neurophysiology.

[34] E. Ahissar,et al. Fast Feedback in Active Sensing: Touch-Induced Changes to Whisker-Object Interaction , 2012, PloS one.

[35] Erika E. Fanselow,et al. Thalamic bursting in rats during different awake behavioral states , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[36] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[37] E. Ahissar,et al. Motor-Sensory Confluence in Tactile Perception , 2012, The Journal of Neuroscience.

[38] E. Ahissar,et al. Neural signature of taste familiarity in the gustatory cortex of the freely behaving rat. , 2004, Journal of neurophysiology.

[39] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.

[40] K. Moxon,et al. Responses of Trigeminal Ganglion Neurons during Natural Whisking Behaviors in the Awake Rat , 2007, Neuron.

[41] Garrett E. Alexander. Basal ganglia , 1998 .

[42] Tamar Flash,et al. Affine differential geometry analysis of human arm movements , 2007, Biological Cybernetics.

[43] T. Prescott,et al. The development of whisker control in rats in relation to locomotion. , 2012, Developmental psychobiology.

[44] Daniel Polani,et al. Information: Currency of life? , 2009, HFSP journal.

[45] H. Harlow. Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950, Journal of comparative and physiological psychology.

[46] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[47] E. Ahissar,et al. Responses of trigeminal ganglion neurons to the radial distance of contact during active vibrissal touch. , 2006, Journal of neurophysiology.

[48] Jürgen Schmidhuber,et al. Learning tactile skills through curious exploration , 2012, Front. Neurorobot..

[49] Joseph H. Solomon,et al. Variability in velocity profiles during free-air whisking behavior of unrestrained rats. , 2008, Journal of neurophysiology.

[50] Daniel N. Hill,et al. Biomechanics of the Vibrissa Motor Plant in Rat: Rhythmic Whisking Consists of Triphasic Neuromuscular Activity , 2008, The Journal of Neuroscience.

[51] Gianluca Baldassarre,et al. What are intrinsic motivations? A biological perspective , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[52] Ehud Ahissar,et al. Hierarchical curiosity loops and active sensing , 2012, Neural Networks.

[53] S. File,et al. Factors controlling measures of anxiety and responses to novelty in the mouse , 2001, Behavioural Brain Research.

[54] A. Pacut,et al. Model-free off-policy reinforcement learning in continuous environment , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[55] R. Guillery,et al. The thalamus as a monitor of motor outputs. , 2002, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[56] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[57] Dori Derdikman,et al. Coding of object location in the vibrissal thalamocortical system. , 2015, Cerebral cortex.

[58] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[59] Lutz Frommberger,et al. Structural knowledge transfer by spatial abstraction for reinforcement learning agents , 2010, Adapt. Behav..

[60] David Kleinfeld,et al. Active sensation: insights from the rodent vibrissa sensorimotor system , 2006, Current Opinion in Neurobiology.

[61] M. Nicolelis,et al. Sensorimotor encoding by synchronous neural ensemble activity at multiple levels of the somatosensory system. , 1995, Science.

[62] A. Grace,et al. Cortico-Basal Ganglia Reward Network: Microcircuitry , 2010, Neuropsychopharmacology.

[63] Benjamin J. Clark,et al. The exploratory behavior of rats in an open environment optimizes security , 2006, Behavioural Brain Research.

[64] Antoine Pécoud,et al. Freedom of movement , 2013 .

[65] Karl J. Friston. The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[66] Celine Mateo,et al. Motor Control by Sensory Cortex , 2010, Science.

[67] N. Daw,et al. Serotonin and Dopamine: Unifying Affective, Activational, and Decision Functions , 2011, Neuropsychopharmacology.

[68] A. Elliot. The Hierarchical Model of Approach-Avoidance Motivation , 2006 .

[69] Benjamin S. Lankow,et al. Toward an Integrated Approach to Perception and Action: Conference Report and Future Directions , 2011, Front. Syst. Neurosci..

[70] Ofer Tchernichovski,et al. The dynamics of long term exploration in the rat , 1998, Biological Cybernetics.

[71] William Rowan,et al. The Study of Instinct , 1953 .

[72] Juyang Weng,et al. Developmental Robotics: Theory and Experiments , 2004, Int. J. Humanoid Robotics.

[73] F. Helmchen,et al. Barrel cortex function , 2013, Progress in Neurobiology.

[74] David Kleinfeld,et al. Closed-loop neuronal computations: focus on vibrissa somatosensation in rat. , 2003, Cerebral cortex.

[75] Omri Harish,et al. Control of the firing patterns of vibrissa motoneurons by modulatory and phasic synaptic inputs: a modeling study. , 2010, Journal of neurophysiology.

[76] D. Kleinfeld,et al. 'Where' and 'what' in the whisker sensorimotor system , 2008, Nature Reviews Neuroscience.

[77] H. Philip Zeigler,et al. Whisker Deafferentation and Rodent Whisking Patterns: Behavioral Evidence for a Central Pattern Generator , 2001, The Journal of Neuroscience.

[78] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.

[79] Harlow Hf. Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950 .

[80] T. Flash,et al. Comparing Smooth Arm Movements with the Two-Thirds Power Law and the Related Segmented-Control Hypothesis , 2002, The Journal of Neuroscience.

[81] David Kleinfeld,et al. Sniffing and whisking in rodents , 2012, Current Opinion in Neurobiology.

[82] Massimo Vergassola,et al. ‘Infotaxis’ as a strategy for searching without gradients , 2007, Nature.

[83] Mitra J. Z. Hartmann,et al. Erratum: Right-Left asymmetries in the whisking behavior of rats anticipate head movements (Journal of Neuroscience (August 23, 2006) (8838-8846)) , 2006 .

[84] E. Ahissar,et al. Temporal and Spatial Characteristics of Vibrissa Responses to Motor Commands , 2010, The Journal of Neuroscience.

[85] Jürgen Schmidhuber,et al. Learning skills from play: Artificial curiosity on a Katana robot arm , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[86] E. Ahissar,et al. Encoding of Vibrissal Active Touch , 2003, Neuron.

[87] Karl J. Friston,et al. Canonical Microcircuits for Predictive Coding , 2012, Neuron.

[88] Friedrich T. Sommer,et al. Learning and exploration in action-perception loops , 2013, Front. Neural Circuits.

[89] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[90] Tamar Flash,et al. Motor primitives in vertebrates and invertebrates , 2005, Current Opinion in Neurobiology.

[91] Ralf Der,et al. The Playful Machine - Theoretical Foundation and Practical Realization of Self-Organizing Robots , 2012, Cognitive Systems Monographs.

[92] Stewart W. Wilson,et al. A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[93] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.