Coaching: learning and using environment and agent models for advice

Coaching is a relationship where one agent provides advice to another about how to act. This thesis explores a range of problems faced by an automated coach agent in providing advice to one or more automated advice-receiving agents. The coach's job is to help the agents perform as well as possible in their environment. We identify and address a set of technical challenges: How can the coach learn and use models of the environment? How should advice be adapted to the peculiarities of the advice receivers? How can opponents be modeled, and how can those models be used? How should advice be represented to be effectively used by a team? This thesis serves both to define the coaching problem and explore solutions to the challenges posed. This thesis is inspired by a simulated robot soccer environment with a coach agent who can provide advice to a team in a standard language. This author developed, in collaboration with others, this coach environment and standard language as the thesis progressed. The experiments in this thesis represent the largest known empirical study in the simulated robot soccer environment. A predator-prey domain and a moving maze environment are used for additional experimentation. All algorithms are implemented in at least one of these environments and empirical validation is performed. In addition to the coach problem formulation and decompositions, the thesis makes several main technical contributions: (i) Several opponent model representations with associated learning algorithms, whose effectiveness in the robot soccer domain is demonstrated. (ii) A study of the effects and need for coach learning under various limitations of the advice receiver and communication bandwidth. (iii) The Multi-Agent Simple Temporal Network, a multi-agent plan representation which is refinement of a Simple Temporal Network, with an associated distributed plan execution algorithm. (iv) Algorithms for learning an abstract Markov Decision Process from external observations, a given state abstraction, and partial abstract action templates. The use of the learned MDP for advice is explored in various scenarios.

[1]  Manuela M. Veloso,et al.  Lazy Incremental Learning of Control Knowledge for Efficiently Obtaining Quality Plans , 1997, Artificial Intelligence Review.

[2]  Tom M. Mitchell,et al.  Learning and Problem Solving , 1983, IJCAI.

[3]  Luís Paulo Reis,et al.  COACH UNILANG - A Standard Language for Coaching a (Robo)Soccer Team , 2001, RoboCup.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[6]  Patrick Riley MPADES: Middleware for Parallel Agent Discrete Event Simulation , 2002, RoboCup.

[7]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[8]  David Maxwell Chickering,et al.  A Bayesian Approach to Learning Bayesian Networks with Local Structure , 1997, UAI.

[9]  H. Friedrich,et al.  In: Probramming by Demonstration vs. Learning from Examples Workshop at Ml'95 Obtaining Good Performance from a Bad Teacher , 1995 .

[10]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[11]  Y. Freund,et al.  The non-stochastic multi-armed bandit problem , 2001 .

[12]  Tara A. Estlin,et al.  Learning to Improve both Efficiency and Quality of Planning , 1997, IJCAI.

[13]  Manuela M. Veloso,et al.  Coaching a simulated soccer team by opponent model recognition , 2001, AGENTS '01.

[14]  Brett Browning,et al.  The Champion UT Austin Villa 2003 Simulator Online Coach Team , 2003 .

[15]  Aaron F. Bobick,et al.  A Framework for Recognizing Multi-Agent Action from Visual Evidence , 1999, AAAI/IAAI.

[16]  Richard E. Korf,et al.  Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[17]  Kôiti Hasida,et al.  MIKE: an automatic commentary system for soccer , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).

[18]  Craig Boutilier,et al.  Process-Oriented Planning and Average-Reward Optimality , 1995, IJCAI.

[19]  Nicola Muscettola,et al.  Reformulating Temporal Plans for Efficient Execution , 1998, KR.

[20]  John Yen,et al.  Training Teams with Collaborative Agents , 2000, Intelligent Tutoring Systems.

[21]  Manuela M. Veloso,et al.  The CMUnited-99 Champion Simulator Team , 2000, AI Mag..

[22]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[23]  H. David Mathias,et al.  A Model of Interactive Teaching , 1997, J. Comput. Syst. Sci..

[24]  Ronald L. Rivest,et al.  Learning Binary Relations and Total Orders , 1993, SIAM J. Comput..

[25]  Manuela M. Veloso,et al.  Fault Tolerant Planning: Toward Probabilistic Uncertainty Models in Symbolic Non-Deterministic Planning , 2004, ICAPS.

[26]  Sally A. Goldman,et al.  Teaching a Smarter Learner , 1996, J. Comput. Syst. Sci..

[27]  Rina Dechter,et al.  Temporal Constraint Networks , 1989, Artif. Intell..

[28]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[29]  Eugene Fink,et al.  Integrating planning and learning: the PRODIGY architecture , 1995, J. Exp. Theor. Artif. Intell..

[30]  田畑 昭久,et al.  Model Based Learning に基づく工学導入教育支援教材の開発 , 2003 .

[31]  Garrison W. Cottrell,et al.  Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.

[32]  Jean-Claude Latombe,et al.  Robot motion planning , 1970, The Kluwer international series in engineering and computer science.

[33]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[34]  Craig Boutilier,et al.  Imitation and Reinforcement Learning in Agents with Heterogeneous Actions , 2001, Canadian Conference on AI.

[35]  Daphne Koller,et al.  Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.

[36]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[37]  Manuela Veloso,et al.  Automated Robot Behavior Recognition Applied to Robotic Soccer , 1999 .

[38]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[39]  Claude Sammut,et al.  Learning to Fly , 1992, ML.

[40]  Craig Boutilier,et al.  Implicit Imitation in Multiagent Reinforcement Learning , 1999, ICML.

[41]  Peter Stone,et al.  CMUnited-98: RoboCup-98 Simulator World Champion Team , 1999 .

[42]  David Jechiel Mostow,et al.  Mechanical transformation of task heuristics into operational procedures , 1981 .

[43]  Manuela M. Veloso,et al.  TTree: Tree-Based State Generalization with Temporally Abstract Actions , 2002, SARA.

[44]  Manuela M. Veloso,et al.  The CMUnited-98 Champion Simulator Team , 1998, RoboCup.

[45]  Jude W. Shavlik,et al.  Creating advice-taking reinforcement learners , 1998 .

[46]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[47]  Jürgen Perl,et al.  Behavior Classification with Self-Organizing Maps , 2000, RoboCup.

[48]  Craig Boutilier,et al.  Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[49]  Jonathan Schaeffer,et al.  A World Championship Caliber Checkers Program , 1992, Artif. Intell..

[50]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[51]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[52]  Milind Tambe,et al.  Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..

[53]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[54]  Raúl Rojas,et al.  RoboCup 2002: Robot Soccer World Cup VI , 2002, Lecture Notes in Computer Science.

[55]  P. F. Riley,et al.  SPADES - a distributed agent simulation environment with software-in-the-loop execution , 2003, Proceedings of the 2003 Winter Simulation Conference, 2003..

[56]  Tucker R. Balch,et al.  Behavior-based formation control for multirobot teams , 1998, IEEE Trans. Robotics Autom..

[57]  Manuela M. Veloso,et al.  Learning the Sequential Coordinated Behavior of Teams from Observations , 2002, RoboCup.

[58]  Brett Browning,et al.  Plays as Effective Multiagent Plans Enabling Opponent-Adaptive Play Selection , 2004, ICAPS.

[59]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[60]  Anthony Stentz,et al.  Optimal and efficient path planning for partially-known environments , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[61]  Manuela M. Veloso,et al.  Advice Generation from Observed Execution: Abstract Markov Decision Process Learning , 2004, AAAI.

[62]  W. Lewis Johnson,et al.  Extending virtual humans to support team training in virtual reality , 2003 .

[63]  Gregory Kuhlmann and Peter Stone and Justin Lallinger The Champion UT Austin Villa 2003 Simulator Online Coach Team , 2004 .

[64]  Manuela M. Veloso,et al.  Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork , 1999, Artif. Intell..

[65]  Milind Tambe,et al.  Multiagent teamwork: analyzing the optimality and complexity of key theories and models , 2002, AAMAS '02.

[66]  Nicola Muscettola,et al.  Execution of Temporal Plans with Uncertainty , 2000, AAAI/IAAI.

[67]  Peter Stone,et al.  CMUnited: a team of robotics soccer agents collaborating in an adversarial environment , 1998, CROS.

[68]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[69]  John E. Laird,et al.  Flexibly Instructable Agents , 1995, J. Artif. Intell. Res..

[70]  Peter Stone,et al.  Anticipation as a key for collaboration in a team of agents: a case study in robotic soccer , 1999, Optics East.

[71]  David Carmel,et al.  Incorporating Opponent Models into Adversary Search , 1996, AAAI/IAAI, Vol. 1.

[72]  Hiroaki Kitano,et al.  RoboCup-99: Robot Soccer World Cup III , 2003, Lecture Notes in Computer Science.

[73]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[74]  Lance Fortnow,et al.  Optimality and Domination in Repeated Games , 1994 .

[75]  Milind Tambe,et al.  Automated assistants to aid humans in understanding team behaviors , 2000, AGENTS '00.

[76]  Peter Stone,et al.  Layered learning in multiagent systems - a winning approach to robotic soccer , 2000, Intelligent robotics and autonomous agents.

[77]  David Heckerman,et al.  Knowledge Representation and Inference in Similarity Networks and Bayesian Multinets , 1996, Artif. Intell..

[78]  Peter Stone,et al.  Keeping the Ball from CMUnited-99 , 2000, RoboCup.

[79]  John R. Anderson,et al.  Cognitive Modeling and Intelligent Tutoring , 1990, Artif. Intell..

[80]  Jafar Habibi,et al.  Using a Two-Layered Case-Based Reasoning for Prediction in Soccer Coach , 2003, MLMTA.

[81]  Takeshi Morimoto,et al.  How to Develop a RoboCupRescue Agent , 2002 .

[82]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[83]  Kerstin Dautenhahn,et al.  Getting to know each other - Artificial social intelligence for autonomous robots , 1995, Robotics Auton. Syst..

[84]  David Atkinson,et al.  Generating Perception Requests and Expectations to Verify the Execution of Plans , 1986, AAAI.

[85]  Paul E. Utgoff,et al.  A Teaching Method for Reinforcement Learning , 1992, ML.

[86]  Wentian Li The complexity of DNA , 1997 .

[87]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[88]  Hector J. Levesque,et al.  Knowledge Representation and Reasoning , 2004 .

[89]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[90]  W. Lewis Johnson,et al.  Animated Agents for Procedural Training in Virtual Reality: Perception, Cognition, and Motor Control , 1999, Appl. Artif. Intell..

[91]  Edmund H. Durfee,et al.  A Rigorous, Operational Formalization of Recursive Modeling , 1995, ICMAS.

[92]  Martha C. Polson,et al.  Foundations of intelligent tutoring systems , 1988 .

[93]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[94]  Saso Dzeroski,et al.  Integrating Guidance into Relational Reinforcement Learning , 2004, Machine Learning.

[96]  Jafar Habibi,et al.  Coaching a Soccer Simulation Team in RoboCup Environment , 2002, EurAsia-ICT.

[97]  Geoffrey J. Gordon,et al.  Distributed Planning in Hierarchical Factored MDPs , 2002, UAI.

[98]  Tom M. Mitchell,et al.  Explanation-Based Generalization: A Unifying View , 1986, Machine Learning.

[99]  Hiroaki Kitano,et al.  The RoboCup Synthetic Agent Challenge 97 , 1997, IJCAI.

[100]  David Carmel,et al.  Model-based learning of interaction strategies in multi-agent systems , 1998, J. Exp. Theor. Artif. Intell..

[101]  Thomas Rist,et al.  Three RoboCup Simulation League Commentator Systems , 2000, AI Mag..

[102]  Manuela M. Veloso,et al.  ChaMeleons-01 Team Description , 2001, RoboCup.

[103]  Sarit Kraus,et al.  Collaborative Plans for Complex Group Action , 1996, Artif. Intell..

[104]  Manuela M. Veloso,et al.  ATT-CMUnited-2000: Third Place Finisher in the RoboCup-2000 Simulator League , 2000, RoboCup.

[105]  Allen Newell,et al.  SOAR: An Architecture for General Intelligence , 1987, Artif. Intell..

[106]  John E. Laird,et al.  It knows what you're going to do: adding anticipation to a Quakebot , 2001, AGENTS '01.

[107]  Ubbo Visser,et al.  Virtual Werder , 2000, RoboCup.

[108]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[109]  Manuela M. Veloso,et al.  Planning for Distributed Execution through Use of Probabilistic Opponent Models , 2002, AIPS.

[110]  Jörg Denzinger,et al.  Improving modeling of other agents using stereotypes and compactification of observations , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[111]  Ubbo Visser,et al.  Recognition and Prediction of Motion Situations Based on a Qualitative Motion Description , 2003, RoboCup.

[112]  Allen Newell,et al.  Towards Chunking as a General Learning Mechanism , 1984, AAAI.

[113]  Ronitt Rubinfeld,et al.  Efficient algorithms for learning to play repeated games against computationally bounded adversaries , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[114]  Lance Fortnow,et al.  Optimality and domination in repeated games with bounded players , 1993, STOC '94.

[115]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[116]  John R. Anderson,et al.  Knowledge tracing: Modeling the acquisition of procedural knowledge , 2005, User Modeling and User-Adapted Interaction.

[117]  Patrick Riley,et al.  DISTRIBUTION STATEMENT A Approved for Public Release Distribution Unlimited Classifying Adversarial Behaviors in a Dynamic Inaccessible Multi-Agent Environment , 2022 .

[118]  Gerhard Weiss,et al.  Multiagent systems: a modern approach to distributed artificial intelligence , 1999 .

[119]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[120]  STEVEN MINTON,et al.  A reply to Zito-Wolf's book review ofLearning search control knowledge: An explanation-based approach , 2004, Machine Learning.

[121]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[122]  Manuela M. Veloso,et al.  Recognizing Probabilistic Opponent Movement Models , 2001, RoboCup.

[123]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[124]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[125]  Manuela M. Veloso,et al.  On Behavior Classification in Adversarial Environments , 2000, DARS.

[126]  Kurt Konolige,et al.  Centibots: Very Large Scale Distributed Robotic Teams , 2004, AAAI.

[127]  Manuela M. Veloso,et al.  Coaching Advice and Adaptation , 2003, RoboCup.

[128]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[129]  Dieter Fox,et al.  Centibots: Very Large Scale Distributed Robotic Teams , 2004, AAAI.

[130]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[131]  Andreas Birk,et al.  RoboCup 2001: Robot Soccer World Cup V , 2002, Lecture Notes in Computer Science.

[132]  Nicholas R. Jennings,et al.  Controlling Cooperative Problem Solving in Industrial Multi-Agent Systems Using Joint Intentions , 1995, Artif. Intell..

[133]  Peter Bakker,et al.  Robot see, robot do: An overview of robot imitation , 1996 .

[134]  Peter Stone and Patrick Riley and Manuela Veloso Defining and Using Ideal Teammate and Opponent Models , 2000 .

[135]  Manuela Veloso,et al.  An Empirical Study of Coaching , 2002, DARS.

[136]  Manuela M. Veloso,et al.  Integration of Advice in an Action-Selection Architecture , 2002, RoboCup.

[137]  Manuela M. Veloso,et al.  An overview of coaching with limitations , 2003, AAMAS '03.

[138]  Jonathan Schaeffer,et al.  Opponent Modeling in Poker , 1998, AAAI/IAAI.

[139]  Ivan Bratko,et al.  Skill Reconstruction as Induction of LQ Controllers with Subgoals , 1997, IJCAI.

[140]  Manuela M. Veloso,et al.  Towards any-team coaching in adversarial domains , 2002, AAMAS '02.

[141]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[142]  Julie A. Adams,et al.  Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence , 2001, AI Mag..

[143]  Minoru Asada,et al.  A Hierarchical Multi-module Learning System Based on Self-interpretation of Instructions by Coach , 2003, RoboCup.

[144]  Andrew Tomkins,et al.  A computational model of teaching , 1992, COLT '92.