Towards Deployment of Robust Cooperative AI Agents: An Algorithmic Framework for Learning Adaptive Policies

We study the problem of designing an AI agent that can robustly cooperate with agents of unknown type (i.e., previously unobserved behavior) in multi-agent scenarios. Our work is inspired by realworld applications in which an AI agent, e.g., a virtual assistant, has to cooperate with new types of agents/users after its deployment. We model this problem via parametric Markov Decision Processes where the parameters correspond to a user’s type and characterize her behavior. In the test phase, the AI agent has to interact with a user of an unknown type. We develop an algorithmic framework for learning adaptive policies: our approach relies on observing the user’s actions to make inferences about the user’s type and adapting the policy to facilitate efficient cooperation. We show that without being adaptive, an AI agent can end up performing arbitrarily bad in the test phase. Using our framework, we propose two concrete algorithms for computing policies that automatically adapt to the user in the test phase. We demonstrate the effectiveness of our algorithms in a cooperative gathering game environment for two agents.

[1]  Yan Zheng,et al.  A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents , 2018, NeurIPS.

[2]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[3]  Felipe Leno da Silva,et al.  A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems , 2019, J. Artif. Intell. Res..

[4]  Siddhartha S. Srinivasa,et al.  Shared autonomy via hindsight optimization for teleoperation and teaming , 2017, Int. J. Robotics Res..

[5]  Volkan Cevher,et al.  Interactive Teaching Algorithms for Inverse Reinforcement Learning , 2019, IJCAI.

[6]  Leslie Pack Kaelbling,et al.  POMCoP: Belief Space Planning for Sidekicks in Cooperative Games , 2012, AIIDE.

[7]  Nathan R. Sturtevant,et al.  Conflict-based search for optimal multi-agent pathfinding , 2012, Artif. Intell..

[8]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[9]  Aurko Roy,et al.  Reinforcement Learning under Model Mismatch , 2017, NIPS.

[10]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[11]  Paul R. Daugherty,et al.  Collaborative Intelligence: Humans and AI Are Joining Forces , 2018 .

[12]  Paul N. Bennett,et al.  Guidelines for Human-AI Interaction , 2019, CHI.

[13]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[14]  Wei Chu,et al.  Enhancing personalized search by mining and modeling task behavior , 2013, WWW.

[15]  Rob Fergus,et al.  Modeling Others using Oneself in Multi-Agent Reinforcement Learning , 2018, ICML.

[16]  Sebastian Tschiatschek,et al.  Teaching Inverse Reinforcement Learners via Features and Demonstrations , 2018, NeurIPS.

[17]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[20]  Stephen J. Roberts,et al.  Learning Against Non-Stationary Agents with Opponent Modelling and Deep Reinforcement Learning , 2018, AAAI Spring Symposia.

[21]  Siddhartha S. Srinivasa,et al.  Game-Theoretic Modeling of Human Adaptation in Human-Robot Collaboration , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[22]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Christos Dimitrakakis,et al.  Multi-View Decision Processes: The Helper-AI Problem , 2017, NIPS.

[24]  H. Francis Song,et al.  Machine Theory of Mind , 2018, ICML.

[25]  Ryen W. White,et al.  From devices to people: attribution of search activity in multi-user settings , 2014, WWW.

[26]  Sriraam Natarajan,et al.  A Decision-Theoretic Model of Assistance , 2007, IJCAI.

[27]  Ofra Amir,et al.  Interactive Teaching Strategies for Agent Training , 2016, IJCAI.

[28]  Sergey Levine,et al.  Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Andreas Krause,et al.  Learning to Interact With Learning Agents , 2018, AAAI.

[30]  Katja Hofmann,et al.  Meta Reinforcement Learning with Latent Variable Gaussian Processes , 2018, UAI.

[31]  David Isele,et al.  CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning , 2018, ICLR.

[32]  Stefanos Nikolaidis,et al.  Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[33]  Ryen W. White,et al.  Enhancing personalization via search activity attribution , 2014, SIGIR.

[34]  Masayoshi Tomizuka,et al.  Interaction-aware Decision Making with Adaptive Strategies under Merging Scenarios , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[36]  David C. Parkes,et al.  Learning to Collaborate in Markov Decision Processes , 2019, ICML.

[37]  Sebastian Tschiatschek,et al.  Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints , 2019, NeurIPS.

[38]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[39]  Amanullah M. T. Oo,et al.  Distributed multi-agent based coordinated power management and control strategy for microgrids with distributed energy resources , 2017 .

[40]  Maruan Al-Shedivat,et al.  Learning Policy Representations in Multiagent Systems , 2018, ICML.

[41]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[42]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[43]  Yishay Mansour,et al.  Approximate Equivalence of Markov Decision Processes , 2003, COLT.

[44]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[45]  David C. Parkes,et al.  Policy teaching through reward function learning , 2009, EC '09.