Multi-robot coordination and competition using mixed integer and linear programs

Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs) are preferred methods representing complex uncertain dynamic systems and determining an optimal control policy to manipulate the system in the desired manner. Until recently, controlling a system composed of multiple agents using the MDP methodology was impossible due to an exponential increase in the size of the MDP problem representation. In this thesis, a novel method for solving large multi-agent MDP systems is presented which avoids this exponential size increase while still providing optimal policies for a large class of useful problems. This thesis provides the following main contributions: A novel description language for multi-agent MDPs. We develop two different modeling techniques for representing multi-agent MDP (MAMDP) coordination problems. The first phrases the problem using a linear program which avoids the exponential state space size of multi-agent MDPs. The second, more expressive representation, expands upon the linear programming representation with the addition of integer constraints. For many problems, these representations exactly represent the original problem with exponentially fewer variables and constraints. This leads to an efficient and optimal solution of the multi-agent MDP. A novel multi-agent coordination method for multi-agent MDPs . We use the Dantzig-Wolfe decomposition technique and the branch and bound method to solve the above models efficiently. These solution methods overcome significant drawbacks in related work. We develop a multi-robot towing and foraging model, devise a novel multi-agent path planner and solve coordination problems with quadratic cost on a global resource. A method to determine the optimal strategies for competing teams . One team of players is allowed to determine the cost function of the MAMDP of the second team. This allows us to solve a zero-sum game played by both teams. One team chooses cost functions and the other coordinates to find the least cost plan given the other team's possible strategies. Each team is coordinated using the newly developed multi-agent coordination method above. A game of robot paintball. Using the above techniques we solve a game of multi-robot paintball. This game is played with real robots at high speeds.

[1]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[2]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[3]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[4]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[5]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[6]  Manuela Veloso,et al.  Multi-Robot Dynamic Role Assignment and Coordination Through Shared Potential Fields , 2002 .

[7]  Sridhar Mahadevan,et al.  Hierarchical Multiagent Reinforcement Learning , 2004 .

[8]  Brian P. Gerkey,et al.  A Formal Analysis and Taxonomy of Task Allocation in Multi-Robot Systems , 2004, Int. J. Robotics Res..

[9]  Stergios I. Roumeliotis,et al.  Distributed Multi-Robot Localization , 2000, DARS.

[10]  Rachid Alami,et al.  M+: a scheme for multi-robot cooperation through negotiated task allocation and achievement , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[11]  S. David Wu,et al.  On combinatorial auction and Lagrangean relaxation for distributed resource scheduling , 1999 .

[12]  Anthony Stentz,et al.  A Market Approach to Multirobot Coordination , 2001 .

[13]  D. Koller,et al.  Planning under uncertainty in complex structured environments , 2003 .

[14]  Satinder P. Singh,et al.  How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[15]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[16]  M. Yannakakis Expressing combinatorial optimization problems by linear programs , 1991, Symposium on the Theory of Computing.

[17]  Mark D. Reid,et al.  Learning to Fly: An Application of Hierarchical Reinforcement Learning , 2000, ICML.

[18]  Geoffrey J. Gordon,et al.  Approximate solutions to markov decision processes , 1999 .

[19]  Nicholas Roy,et al.  Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[20]  Howie Choset,et al.  Probabilistic methods for robotic landmine search , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[21]  H. Brendan McMahan,et al.  Planning in Cost-Paired Markov Decision Process Games , 2003 .

[22]  Alex Fukunaga,et al.  Cooperative mobile robotics: antecedents and directions , 1995 .

[23]  Anthony Stentz,et al.  Multi-robot exploration controlled by a market economy , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[24]  Wolfram Burgard,et al.  Optimizing schedules for prioritized path planning of multi-robot systems , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[25]  Takayuki Ito,et al.  Task Allocation in the RoboCup Rescue Simulation Domain: A Short Note , 2001, RoboCup.

[26]  Anthony Stentz,et al.  Complex Task Allocation For Multiple Robots , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[27]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[28]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[29]  Egon Balas,et al.  Gomory cuts revisited , 1996, Oper. Res. Lett..

[30]  Anthony Stentz,et al.  Traderbots: a new paradigm for robust and efficient multirobot coordination in dynamic environments , 2004 .

[31]  Kenneth M. Dawson-Howe,et al.  The detection of buried landmines using probing robots , 1998, Robotics Auton. Syst..

[32]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[33]  Maja J. Matarić,et al.  Robust Behavior-Based Control for Distributed Multi-Robot Collection Tasks , 2000 .

[34]  Manuela M. Veloso,et al.  Simultaneous Adversarial Multi-Robot Learning , 2003, IJCAI.

[35]  H. Kushner,et al.  Decomposition of systems governed by Markov chains , 1974 .

[36]  Anthony Stentz,et al.  MULTIROBOT CONTROL USING TASK ABSTRACTION IN A MARKET FRAMEWORK , 2022 .

[37]  Geoffrey J. Gordon,et al.  Distributed Planning in Hierarchical Factored MDPs , 2002, UAI.

[38]  Craig Boutilier,et al.  Greedy linear value-approximation for factored Markov decision processes , 2002, AAAI/IAAI.

[39]  John M. Dolan,et al.  Modified reactive control framework for cooperative mobile robots , 1997, Other Conferences.

[40]  PATRICK T. HARKER,et al.  COORDINATING LOCALLY CONSTRAINED AGENTS USING AUGMENTED PRICING , 1998 .

[41]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[42]  Lynne E. Parker,et al.  On the design of behavior-based multi-robot teams , 1995, Adv. Robotics.

[43]  M. Golfarelli,et al.  A Task-Swap Negotiation Protocol Based on the Contract Net Paradigm , 2000 .

[44]  Sebastian Thrun,et al.  Coastal Navigation with Mobile Robots , 1999, NIPS.

[45]  Reid G. Smith,et al.  The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver , 1980, IEEE Transactions on Computers.

[46]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[47]  Ronald L. Rardin,et al.  Optimization in operations research , 1997 .

[48]  Maja J. Matarić,et al.  Sold!: Market methods for multi-robot control , 2001 .

[49]  Wolfram Burgard,et al.  Collaborative multi-robot exploration , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[50]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[51]  Michael Buro,et al.  Real-Time Strategy Games: A New AI Research Challenge , 2003, IJCAI.

[52]  Nidhi Kalra,et al.  A MARKET APPROACH TO TIGHTLY-COUPLED MULTI-ROBOT COORDINATION: FIRST RESULTS , 2003 .

[53]  Luke Hunsberger,et al.  A combinatorial auction for collaborative planning , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[54]  Jerzy A. Filar,et al.  Weighted Reward Criteria in Competitive Markov Decision Processes , 1989 .