Generating Adaptive Behaviour within a Memory-Prediction Framework

The Memory-Prediction Framework (MPF) and its Hierarchical-Temporal Memory implementation (HTM) have been widely applied to unsupervised learning problems, for both classification and prediction. To date, there has been no attempt to incorporate MPF/HTM in reinforcement learning or other adaptive systems; that is, to use knowledge embodied within the hierarchy to control a system, or to generate behaviour for an agent. This problem is interesting because the human neocortex is believed to play a vital role in the generation of behaviour, and the MPF is a model of the human neocortex. We propose some simple and biologically-plausible enhancements to the Memory-Prediction Framework. These cause it to explore and interact with an external world, while trying to maximize a continuous, time-varying reward function. All behaviour is generated and controlled within the MPF hierarchy. The hierarchy develops from a random initial configuration by interaction with the world and reinforcement learning only. Among other demonstrations, we show that a 2-node hierarchy can learn to successfully play “rocks, paper, scissors” against a predictable opponent.

[1]  Teuvo Kohonen,et al.  In: Self-organising Maps , 1995 .

[2]  V. Srinivasa Chakravarthy,et al.  What do the basal ganglia do? A modeling perspective , 2010, Biological Cybernetics.

[3]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[4]  Margaret Mary Skelly,et al.  Hierarchical Reinforcement Learning with Function Approximation for Adaptive Control , 2004 .

[5]  A. Reiner,et al.  You Cannot Have a Vertebrate Brain Without a Basal Ganglia , 2009 .

[6]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[7]  Tom Ziemke,et al.  Action, Detection, and Perception: A Computational Model of the Relation Between Movement and Orientation Selectivity in the Cerebral Cortex , 2009 .

[8]  D. George,et al.  A hierarchical Bayesian model of invariant pattern recognition in the visual cortex , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[9]  Angelo Cangelosi,et al.  Epigenetic Robotics Architecture (ERA) , 2010, IEEE Transactions on Autonomous Mental Development.

[10]  J. Hollerman,et al.  Reward processing in primate orbitofrontal cortex and basal ganglia. , 2000, Cerebral cortex.

[11]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[12]  Pragya Agarwal,et al.  Self-Organising Maps , 2008 .

[13]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[14]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[15]  Kwang-Ho Seok,et al.  A new robot motion authoring method using HTM , 2008, 2008 International Conference on Control, Automation and Systems.

[16]  Arne Eigenfeldt,et al.  A Music Database and Query System for Recombinant Composition , 2008, ISMIR.

[17]  Dileep George,et al.  Sequence memory for prediction, inference and behaviour , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[18]  Kenny Smith Proceedings of the 31th Annual Conference of the Cognitive Science Society , 2009 .

[19]  MahadevanSridhar,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[20]  Bhaskara Marthi,et al.  Concurrent Hierarchical Reinforcement Learning , 2005, IJCAI.

[21]  Victor Uc Cetina,et al.  Reinforcement learning in continuous state and action spaces , 2009 .

[22]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[23]  John Thornton,et al.  Character Recognition Using Hierarchical Vector Quantization and Temporal Pooling , 2008, Australasian Conference on Artificial Intelligence.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  J. Hawkins,et al.  On Intelligence , 2004 .

[26]  Judea Pearl,et al.  A Computational Model for Causal and Diagnostic Reasoning in Inference Systems , 1983, IJCAI.

[27]  Jeffrey W. Miller,et al.  Biomimetic sensory abstraction using hierarchical quilted self-organizing maps , 2006, SPIE Optics East.

[28]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[29]  D. Cliff From animals to animats 3 : proceedings of the Third International Conference on Simulation of Adaptive Behavior , 1994 .

[30]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[31]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[32]  C. Lebiere,et al.  Conditional routing of information to the cortex: a model of the basal ganglia's role in cognitive coordination. , 2010, Psychological review.

[33]  Ben J Hicks,et al.  SPIE - The International Society for Optical Engineering , 2001 .

[34]  Dileep George,et al.  Towards a Mathematical Theory of Cortical Micro-circuits , 2009, PLoS Comput. Biol..

[35]  John Thornton,et al.  Robust Character Recognition Using a Hierarchical Bayesian Network , 2006, Australian Conference on Artificial Intelligence.

[36]  Abdul Sattar,et al.  AI 2006: Advances in Artificial Intelligence, 19th Australian Joint Conference on Artificial Intelligence, Hobart, Australia, December 4-8, 2006, Proceedings , 2006, Australian Conference on Artificial Intelligence.

[37]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[38]  Lou Boves,et al.  Spoken digit recognition using a hierarchical temporal memory , 2008, INTERSPEECH.

[39]  J. Gold Linking reward expectation to behavior in the basal ganglia , 2003, Trends in Neurosciences.

[40]  Saulius Juozas Garalevicius Memory-Prediction Framework for Pattern Recognition: Performance and Suitability of the Bayesian Model of Visual Cortex , 2007, FLAIRS Conference.

[41]  Marc Toussaint,et al.  Model-free reinforcement learning as mixture learning , 2009, ICML '09.

[42]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[43]  Dileep George,et al.  How the brain might work: a hierarchical and temporal model for learning and recognition , 2008 .

[44]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[45]  Hagai Attias,et al.  Planning by Probabilistic Inference , 2003, AISTATS.

[46]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[47]  S. Haber,et al.  The cortico-basal ganglia integrative network: The role of the thalamus , 2009, Brain Research Bulletin.

[48]  Zili Zhang,et al.  A generalized joint inference approach for citation matching , 2008 .

[49]  Lynn Nadel,et al.  Encyclopedia of Cognitive Science , 2003 .

[50]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .