A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at this https URL .

[1]  Randall Smith,et al.  Estimating uncertain spatial relationships in robotics , 1986, Proceedings. 1987 IEEE International Conference on Robotics and Automation.

[2]  Peter Cheeseman,et al.  On the Representation and Estimation of Spatial Uncertainty , 1986 .

[3]  John Canny,et al.  The complexity of robot motion planning , 1988 .

[4]  Ann Hogue,et al.  Introduction to Academic Writing , 1988 .

[5]  Alberto Elfes,et al.  Using occupancy grids for mobile robot perception and navigation , 1989, Computer.

[6]  Benjamin Kuipers,et al.  A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations , 1991, Robotics Auton. Syst..

[7]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[8]  Lydia E. Kavraki,et al.  Probabilistic roadmaps for path planning in high-dimensional configuration spaces , 1996, IEEE Trans. Robotics Autom..

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[11]  Steven M. LaValle,et al.  Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[12]  Frank Dellaert,et al.  Structure from motion without correspondence , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[14]  Walt Truszkowski,et al.  Innovative Concepts for Agent-Based Systems , 2002, Lecture Notes in Computer Science.

[15]  C. Lee Giles,et al.  Learning Communication for Multi-agent Systems , 2002, WRAC.

[16]  Marc Toussaint,et al.  Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System , 2003, NIPS.

[17]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[18]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[19]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[22]  E. Schmidt Assistant Professor , 2007 .

[23]  Guillaume J. Laurent,et al.  Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[25]  A. Kamiya,et al.  Learning of communication codes in multi-agent reinforcement learning problem , 2008, 2008 IEEE Conference on Soft Computing in Industrial Applications.

[26]  Richard L. Lewis,et al.  A new approach to exploring language emergence as boundedly optimal control in the face of environmental and cognitive constraints , 2010 .

[27]  Vincent Lepetit,et al.  View-based Maps , 2010, Int. J. Robotics Res..

[28]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[29]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[30]  Francisco S. Melo,et al.  QueryPOMDP: POMDP-Based Communication in Multiagent Systems , 2011, EUMAS.

[31]  Marc Pollefeys,et al.  Vision-based autonomous mapping and exploration using a quadrotor MAV , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Patric Jensfelt,et al.  Active Visual Object Search in Unknown Environments Using Uncertain Semantics , 2013, IEEE Transactions on Robotics.

[33]  Rob Fergus,et al.  MazeBase: A Sandbox for Learning from Games , 2015, ArXiv.

[34]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[35]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[36]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[37]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[38]  Fernando Diaz,et al.  Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains , 2016, ArXiv.

[39]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[40]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[41]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[42]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[43]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[44]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[45]  Martial Hebert,et al.  Learning Transferable Policies for Monocular Reactive MAV Control , 2016, ISER.

[46]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[47]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[48]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[49]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[50]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Thomas A. Funkhouser,et al.  MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments , 2017, ArXiv.

[52]  Stephen Clark,et al.  Understanding Grounded Language Learning Agents , 2017, ArXiv.

[53]  Alexander G. Schwing,et al.  Creativity: Generating Diverse Questions Using Variational Autoencoders , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[55]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[56]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[57]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[58]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[59]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[60]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[61]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[62]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[63]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[64]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[65]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[66]  Sergey Levine,et al.  PLATO: Policy learning using adaptive trajectory optimization , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[67]  Stefan Lee,et al.  Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition , 2018, CoRL.

[68]  Vladlen Koltun,et al.  Semi-parametric Topological Memory for Navigation , 2018, ICLR.

[69]  Stefan Lee,et al.  Neural Modular Control for Embodied Question Answering , 2018, CoRL.

[70]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[71]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[72]  Jason Weston,et al.  Talk the Walk: Navigating New York City through Grounded Dialogue , 2018, ArXiv.

[73]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[74]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[75]  Stefan Lee,et al.  Embodied Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  Andrea Vedaldi,et al.  MapNet: An Allocentric Spatial Memory for Mapping Environments , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[78]  Raia Hadsell,et al.  Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[79]  Ali Farhadi,et al.  IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[80]  Simon Brodeur,et al.  HoME: a Household Multimodal Environment , 2017, ICLR.

[81]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[82]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[83]  Stefan Lee,et al.  Chasing Ghosts: Instruction Following as Bayesian State Tracking , 2019, NeurIPS.

[84]  Boyuan Chen,et al.  Visual Hide and Seek , 2019, ALIFE.

[85]  T. Başar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[86]  Ali Farhadi,et al.  Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Kristen Grauman,et al.  Audio-Visual Embodied Navigation , 2019, ArXiv.

[88]  Ali Farhadi,et al.  Artificial Agents Learn Flexible Visual Representations by Playing a Hiding Game , 2019, ArXiv.

[89]  Ali Farhadi,et al.  Two Body Problem: Collaborative Visual Task Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[90]  Yuandong Tian,et al.  Bayesian Relational Memory for Semantic Visual Navigation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[91]  Xinlei Chen,et al.  Embodied Amodal Recognition: Learning to Move to Perceive Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[92]  Raia Hadsell,et al.  The StreetLearn Environment and Dataset , 2019, ArXiv.

[93]  Yonatan Bisk,et al.  Shifting the Baseline: Single Modality Performance on Visual Navigation & QA , 2018, NAACL.

[94]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[95]  Yoav Artzi,et al.  TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[96]  Piotr Mirowski Learning to Navigate , 2019 .

[97]  Yoav Artzi,et al.  Executing Instructions in Situated Collaborative Interactions , 2019, EMNLP.

[98]  Yuan-Fang Wang,et al.  Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[99]  Ali Farhadi,et al.  Visual Semantic Navigation using Scene Priors , 2018, ICLR.

[100]  Stefan Lee,et al.  Embodied Question Answering in Photorealistic Environments With Point Cloud Perception , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[101]  Alexander G. Schwing,et al.  PIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning , 2019, CoRL.

[102]  Joelle Pineau,et al.  TarMAC: Targeted Multi-Agent Communication , 2018, ICML.

[103]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[104]  Silvio Savarese,et al.  Interactive Gibson: A Benchmark for Interactive Navigation in Cluttered Environments , 2019, ArXiv.

[105]  K. Grauman,et al.  SoundSpaces: Audio-Visual Navigation in 3D Environments , 2019, ECCV.

[106]  Kristen Grauman,et al.  VisualEchoes: Spatial Image Representation Learning through Echolocation , 2020, ECCV.

[107]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[108]  Ruslan Salakhutdinov,et al.  Learning to Explore using Active Neural SLAM , 2020, ICLR.

[109]  Yen-Cheng Liu,et al.  Who2com: Collaborative Perception via Learnable Handshake Communication , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[110]  Yen-Cheng Liu,et al.  When2com: Multi-Agent Perception via Communication Graph Grouping , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[111]  Ruslan Salakhutdinov,et al.  Learning To Explore Using Active Neural Mapping , 2020, ICLR 2020.

[112]  Roozbeh Mottaghi,et al.  AllenAct: A Framework for Embodied AI Research , 2020, ArXiv.

[113]  Stephen R. Clark,et al.  Probing Emergent Semantics in Predictive Agents via Question Answering , 2020, ICML.

[114]  Santhosh K. Ramakrishnan,et al.  An Exploration of Embodied Visual Exploration , 2020, International Journal of Computer Vision.

[115]  S. Lazebnik,et al.  GridToPix: Training Embodied Agents with Minimal Supervision , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[116]  Svetlana Lazebnik,et al.  Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[117]  Alexander G. Schwing,et al.  Cooperative Exploration for Multi-Agent Deep Reinforcement Learning , 2021, ICML.