Implicit Imitation in Multiagent Reinforcement Learning

Imitation is actively being studied as an effective means of learning in multi-agent environments. It allows an agent to learn how to act well (perhaps optimally) by passively observing the actions of cooperative teachers or other more experienced agents its environment. We propose a straightforward imitation mechanism called model extraction that can be integrated easily into standard model-based reinforcement learning algorithms. Roughly, by observing a mentor with similar capabilities, an agent can extract information about its own capabilities in unvisited parts of state space. The extracted information can accelerate learning dramatically. We illustrate the benefits of model extraction by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability, possible interactions and common abilities, we briefly comment on extensions of the model that relax these.

[1]  Maja J. Mataric,et al.  Using communication to reduce locality in distributed multiagent learning , 1997, J. Exp. Theor. Artif. Intell..

[2]  Hiroshi Motoda,et al.  Machine Learning Techniques to Make Computers Easier to Use , 1997, IJCAI.

[3]  Aude Billard,et al.  Learning to Communicate Through Imitation in Autonomous Robots , 1997, ICANN.

[4]  Tom M. Mitchell,et al.  LEAP: A Learning Apprentice for VLSI Design , 1985, IJCAI.

[5]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[8]  Paul E. Utgoff,et al.  Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.

[9]  K. Nelson,et al.  A Fresh Look at Imitation in Language Learning , 1989 .

[10]  Aude Billard,et al.  Grounding communication in autonomous robots: An experimental study , 1998, Robotics Auton. Syst..

[11]  Peter Bakker,et al.  Robot see, robot do: An overview of robot imitation , 1996 .

[12]  Kerstin Dautenhahn,et al.  Getting to know each other - Artificial social intelligence for autonomous robots , 1995, Robotics Auton. Syst..

[13]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[14]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[15]  G. Fiorito,et al.  Observational Learning in Octopus vulgaris , 1992, Science.

[16]  Steven D. Whitehead,et al.  A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[17]  C. Atkeson,et al.  Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .

[18]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[19]  Kerstin Dautenhahn,et al.  Mapping between dissim ilar bodies: Affordances and the algebraic foundations of imitation , 1998 .

[20]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[21]  Walter L. Smith Probability and Statistics , 1959, Nature.

[22]  Stefan Schaal,et al.  A Kendama learning robot based on a dynamic optimization theory , 1995, Proceedings 4th IEEE International Workshop on Robot and Human Communication.