Instance-Based Online Learning of Deterministic Relational Action Models

We present an instance-based, online method for learning action models in unanticipated, relational domains. Our algorithm memorizes pre- and post-states of transitions an agent encounters while experiencing the environment, and makes predictions by using analogy to map the recorded transitions to novel situations. Our algorithm is implemented in the Soar cognitive architecture, integrating its task-independent episodic memory module and analogical reasoning implemented in procedural memory. We evaluate this algorithm's prediction performance in a modified version of the blocks world domain and the taxi domain. We also present a reinforcement learning agent that uses our model learning algorithm to significantly speed up its convergence to an optimal policy in the modified blocks world domain.

[1]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[3]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[4]  Brian Falkenhainer,et al.  The Structure-Mapping Engine: Algorithm and Examples , 1989, Artif. Intell..

[5]  John E. Laird,et al.  Learning to use episodic memory , 2011, Cognitive Systems Research.

[6]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[7]  John E. Laird,et al.  Efficiently Implementing Episodic Memory , 2009, ICCBR.

[8]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[9]  John E. Laird,et al.  Extending the Soar Cognitive Architecture , 2008, AGI.

[10]  Scott Benson,et al.  Inductive Learning of Reactive Action Models , 1995, ICML.

[11]  John E. Laird,et al.  Soar-RL: integrating reinforcement learning with Soar , 2005, Cognitive Systems Research.

[12]  Xuemei Wang,et al.  Learning by Observation and Practice: An Incremental Approach for Planning Operator Acquisition , 1995, ICML.

[13]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[14]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[15]  John E. Laird,et al.  Enhancing intelligent agents with episodic memory , 2012, Cognitive Systems Research.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .