Learning Robust Helpful Behaviors in Two-Player Cooperative Atari Environments

We initiate the study of helpful behavior in the setting of two-player Atari games, suitably modified to provide cooperative incentives. Our main interest is to understand whether reinforcement learning can be used to achieve robust, helpful behavior— where one agent is trained to help a second, partner agent. Robustness requires the helpful AI to be able to cooperate effectively with a diverse set of partners. We study this question with both artificial partner agents as well as human participants (introducing a new, web-based framework for the study of human-with-AI behavior). We achieve positive results in both Space Invaders and Fall Down, as well as successful transfer to human partners, including with people who are asked to deliberately follow unexpected behaviors.

[1]  Sarit Kraus,et al.  Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[2]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[5]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[6]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[7]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[8]  Ofra Amir,et al.  Interactive Teaching Strategies for Agent Training , 2016, IJCAI.

[9]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[10]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[11]  Christos Dimitrakakis,et al.  Multi-View Decision Processes: The Helper-AI Problem , 2017, NIPS.

[12]  Alexander Peysakhovich,et al.  Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[13]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[14]  Iyad Rahwan,et al.  Cooperating with machines , 2017, Nature Communications.

[15]  Katja Hofmann,et al.  The Atari Grand Challenge Dataset , 2017, ArXiv.

[16]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[17]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[18]  Siddhartha S. Srinivasa,et al.  Game-Theoretic Modeling of Human Adaptation in Human-Robot Collaboration , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[19]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[20]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[21]  Maruan Al-Shedivat,et al.  Learning Policy Representations in Multiagent Systems , 2018, ICML.

[22]  Rob Fergus,et al.  Modeling Others using Oneself in Multi-Agent Reinforcement Learning , 2018, ICML.

[23]  H. Francis Song,et al.  Machine Theory of Mind , 2018, ICML.

[24]  Abhinav Gupta,et al.  Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias , 2018, NeurIPS.

[25]  Stefano Ermon,et al.  Multi-Agent Generative Adversarial Imitation Learning , 2018, NeurIPS.

[26]  Anca D. Dragan,et al.  Shared Autonomy via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[27]  Julian Togelius,et al.  Diverse Agents for Ad-Hoc Cooperation in Hanabi , 2019, 2019 IEEE Conference on Games (CoG).

[28]  Sebastian Tschiatschek,et al.  Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints , 2019, NeurIPS.

[29]  David C. Parkes,et al.  Learning to Collaborate in Markov Decision Processes , 2019, ICML.

[30]  Nando de Freitas,et al.  Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.

[31]  Anca D. Dragan,et al.  On the Utility of Learning about Humans for Human-AI Coordination , 2019, NeurIPS.

[32]  Daniela Rus,et al.  Learning Robust Control Policies for End-to-End Autonomous Driving From Data-Driven Simulation , 2020, IEEE Robotics and Automation Letters.

[33]  Sebastian Tschiatschek,et al.  Towards Deployment of Robust Cooperative AI Agents: An Algorithmic Framework for Learning Adaptive Policies , 2020, AAMAS.

[34]  Benjamin Black,et al.  Multiplayer Support for the Arcade Learning Environment , 2020, ArXiv.

[35]  Siddhartha Sen,et al.  Aligning Superhuman AI with Human Behavior: Chess as a Model System , 2020, KDD.