A framework for meta-level control in multi-agent systems

Sophisticated agents operating in open environments must make decisions that efficiently trade off the use of their limited resources between dynamic deliberative actions and domain actions. This is the meta-level control problem for agents operating in resource-bounded multi-agent environments. Control activities involve decisions on when to invoke and the amount to effort to put into scheduling and coordination of domain activities. The focus of this paper is how to make effective meta-level control decisions. We show that meta-level control with bounded computational overhead allows complex agents to solve problems more efficiently than current approaches in dynamic open multi-agent environments. The meta-level control approach that we present is based on the decision-theoretic use of an abstract representation of the agent state. This abstraction concisely captures critical information necessary for decision making while bounding the cost of meta-level control and is appropriate for use in automatically learning the meta-level control policies.

[1]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[2]  Victor R. Lesser,et al.  Design-to-time real-time scheduling , 1993, IEEE Trans. Syst. Man Cybern..

[3]  Simon Parsons,et al.  Do the right thing - studies in limited rationality by Stuart Russell and Eric Wefald, MIT Press, Cambridge, MA, £24.75, ISBN 0-262-18144-4 , 1994, The Knowledge Engineering Review.

[4]  A. Garvey Issues in Design-to-time Real-time Scheduling � , 2007 .

[5]  Barbara Hayes-Roth,et al.  Opportunistic control of action in intelligent agents , 1993, IEEE Trans. Syst. Man Cybern..

[6]  Daishi Harada,et al.  Extended abstract: Learning search strategies , 1999 .

[7]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[8]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[9]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[10]  Shlomo Zilberstein,et al.  Models of Bounded Rationality , 1995 .

[11]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[12]  Herbert A. Simon,et al.  Optimal Problem-Solving Search: All-Oor-None Solutions , 1975, Artif. Intell..

[13]  Kazuhiro Kuwabara Meta-Level Control of Coordination Protocols , 1996 .

[14]  Stuart J. Russell,et al.  Do the right thing - studies in limited rationality , 1991 .

[15]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[16]  Victor R. Lesser,et al.  Toward robust agent control in open environments , 2000, AGENTS '00.

[17]  W. L. Johnson,et al.  Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems , 2002 .

[18]  Michael L. Littman,et al.  A Distributed Reinforcement Learning Scheme for Network Routing , 1993 .

[19]  H. Simon Method and Appraisal in Economics: From substantive to procedural rationality , 1976 .

[20]  Amy L. Lansky,et al.  Reactive Reasoning and Planning , 1987, AAAI.

[21]  Shlomo Zilberstein,et al.  Monitoring anytime algorithms , 1996, SGAR.

[22]  Victor Lesser,et al.  Environment Centered Analysis and Design of Coordination Mechanisms , 1996 .

[23]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[24]  Shlomo Zilberstein,et al.  Efficient resource-bounded reasoning in AT-RALPH , 1992 .

[25]  Anita Raja,et al.  Meta-level control in multi-agent systems , 2003 .

[26]  Marilyn A. Walker,et al.  Empirical Evaluation of a Reinforcement Learning Spoken Dialogue System , 2000, AAAI/IAAI.

[27]  Michail G. Lagoudakis,et al.  Reinforcement Learning for Algorithm Selection , 2000, AAAI/IAAI.

[28]  Anita Raja,et al.  Leveraging Problem Classification in Online Meta-Cognition , 2006, AAAI Spring Symposium: Distributed Plan and Schedule Management.

[29]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[30]  Mark S. Boddy,et al.  Deliberation Scheduling for Problem Solving in Time-Constrained Environments , 1994, Artif. Intell..

[31]  Toshiharu Sugawara,et al.  On-Line Learning of Coordination Plans , 1993 .

[32]  Costas Tsatsoulis,et al.  Learning Communication Strategies in Multiagent Systems , 1998, Applied Intelligence.

[33]  Victor R. Lesser,et al.  Criteria-directed task scheduling , 1998, Int. J. Approx. Reason..

[34]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[35]  Victor R. Lesser,et al.  An Agent Infrastructure to Build and Evaluate Multi-Agent Systems: The Java Agent Framework and Multi-Abent System Simulator , 2000, Agents Workshop on Infrastructure for Multi-Agent Systems.

[36]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[37]  Norman M. Sadeh,et al.  Increasing The Efficiency of Simulated Annealing Search by Learning to Recognize (Un)Promising Runs , 1994, AAAI.

[38]  J. Hendler,et al.  The Challenges of Real-time Ai , 1995 .

[39]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[40]  Robert P. Goldman,et al.  Managing Online Self-adaptation in Real-Time Environments , 2001, IWSAS.

[41]  H. Simon,et al.  From substantive to procedural rationality , 1976 .

[42]  Mark S. Boddy,et al.  An Analysis of Time-Dependent Planning , 1988, AAAI.

[43]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[44]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[45]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[46]  Victor R. Lesser,et al.  Multi-linked negotiation in multi-agent systems , 2002, AAMAS '02.

[47]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[48]  Victor Lesser,et al.  Criteria-Directed Heuristic Task Scheduling TITLE2: , 1997 .

[49]  Devika Subramanian,et al.  Provably Bounded Optimal Agents , 1993, IJCAI.

[50]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[51]  Jon Doyle,et al.  What Is Rational Psychology? Toward a Modern Mental Philosophy , 1983, AI Mag..

[52]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[53]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[54]  Ángel Viña,et al.  Guardian: A Prototype Intelligent Agent for Intensive-Care Monitoring , 1994, AAAI.

[55]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[56]  Shlomo Zilberstein,et al.  Reactive Control of Dynamic Progressive Processing , 1999, IJCAI.

[57]  Eric Horvitz,et al.  Reasoning under Varying and Uncertain Resource Constraints , 1988, AAAI.

[58]  David J. Musliner Plan Execution in Mission-Critical Domains , 1996 .

[59]  Stuart J. Russell,et al.  Principles of Metareasoning , 1989, Artif. Intell..

[60]  Michael Wooldridge,et al.  The control of reasoning in resource-bounded agents , 2001, The Knowledge Engineering Review.

[61]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[62]  Victor R. Lesser,et al.  The Soft Real-Time Agent Control Architecture , 2005, Autonomous Agents and Multi-Agent Systems.

[63]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[64]  Shlomo Zilberstein,et al.  Optimal Composition of Real-Time Systems , 1996, Artif. Intell..

[65]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.