Imitation learning for task allocation

At the heart of multi-robot task allocation lies the ability to compare multiple options in order to select the best. In some domains this utility evaluation is not straightforward, for example due to complex and unmodeled underlying dynamics or an adversary in the environment. Explicitly modeling these extrinsic influences well enough so that they can be accounted for in utility computation (and thus task allocation) may be intractable, but a human expert may be able to quickly gain some intuition about the form of the desired solution. We propose to harness the expert's intuition by applying imitation learning to the multi-robot task allocation domain. Using a market-based method, we steer the allocation process by biasing prices in the market according to a policy which we learn using a set of demonstrated allocations (the expert's solutions to a number of domain instances). We present results in two distinct domains: a disaster response scenario where a team of agents must put out fires that are spreading between buildings, and an adversarial game in which teams must make complex strategic decisions to score more points than their opponents.

[1]  Nidhi Kalra,et al.  Hoplites: A Market-Based Framework for Planned Tight Coordination in Multirobot Teams , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[2]  Anthony Stentz,et al.  Time-extended multi-robot coordination for domains with intra-path constraints , 2009, Robotics: Science and Systems.

[3]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[4]  Reid G. Simmons,et al.  Learning Opportunity Costs in Multi-Robot Market Based Planners , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[5]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[6]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[7]  Martial Hebert,et al.  Directional Associative Markov Network for 3-D Point Cloud Classification , 2008 .

[8]  Evangelos Markakis,et al.  The Power of Sequential Single-Item Auctions for Agent Coordination , 2006, AAAI.

[9]  Sebastian Thrun,et al.  Visibility-based Pursuit-evasion with Limited Field of View , 2004, Int. J. Robotics Res..

[10]  Anthony Stentz,et al.  Multi-robot exploration controlled by a market economy , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[11]  Anthony Stentz,et al.  Learning-enhanced market-based task allocation for oversubscribed domains , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Maja J. Mataric,et al.  Sold!: auction methods for multirobot coordination , 2002, IEEE Trans. Robotics Autom..

[13]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[14]  Pieter Abbeel,et al.  Apprenticeship learning for helicopter control , 2009, CACM.

[15]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[16]  Nidhi Kalra,et al.  Market-Based Multirobot Coordination: A Survey and Analysis , 2006, Proceedings of the IEEE.