Programming and Deployment of Autonomous Swarms using Multi-Agent Reinforcement Learning

Autonomous systems (AS) carry out complex missions by continuously observing the state of their surroundings and taking actions toward a goal. Swarms of ASworking together can complete missions faster andmore effectively than single AS alone. To build swarms today, developers handcraft their own software for storing, aggregating, and learning from observations. We present the Fleet Computer, a platform for developing and managing swarms. The Fleet Computer provides a programming paradigm that simplifies multi-agent reinforcement learning (MARL) – an emerging class of algorithms that coordinate swarms of agents. Using just two programmer-provided functions Map() and Eval(), the Fleet Computer compiles and deploys swarms and continuously updates the reinforcement learning models that govern actions. To conserve compute resources, the Fleet Computer gives priority scheduling to models that contribute to effective actions, drawing a novel link between online learning and resource management. We developed swarms for unmanned aerial vehicles (UAV) in agriculture and for video analytics on urban traffic. Compared to individual AS, our swarms achieved speedup of 4.4X using 4 UAV and 62X using 130 video cameras. Compared to a competing approach for building swarms that is widely used in practice, our swarms were 3X more effective, using 3.9X less energy.

[1]  Amanda Bouman,et al.  Autonomous Spot: Long-Range Autonomous Exploration of Extreme Environments with Legged Locomotion , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Wei-Ying Ma,et al.  Understanding mobility based on GPS data , 2008, UbiComp.

[3]  Pingyi Fan,et al.  Toward Big Data Processing in IoT: Path Planning and Resource Management of UAV Base Stations in Mobile-Edge Computing System , 2019, IEEE Internet of Things Journal.

[4]  João Gama,et al.  Predicting Taxi–Passenger Demand Using Streaming Data , 2013, IEEE Transactions on Intelligent Transportation Systems.

[5]  Marek Grzes,et al.  Reward Shaping in Episodic Reinforcement Learning , 2017, AAMAS.

[6]  J. Adinarayana,et al.  Leaf area index estimation using top-of-canopy airborne RGB images , 2021, Int. J. Appl. Earth Obs. Geoinformation.

[7]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[8]  Dong Yan,et al.  Reward Shaping via Meta-Learning , 2019, ArXiv.

[9]  Tamer Basar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[10]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[11]  Roland Siegwart,et al.  From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[13]  Bin Rao,et al.  UAV Swarm Intelligence: Recent Advances and Future Trends , 2020, IEEE Access.

[14]  Lingjia Tang,et al.  The Architectural Implications of Autonomous Driving: Constraints and Acceleration , 2018, ASPLOS.

[15]  Tamer Basar,et al.  Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.

[16]  Flavio Fontana,et al.  Autonomous, Vision‐based Flight and Live Dense 3D Mapping with a Quadrotor Micro Aerial Vehicle , 2016, J. Field Robotics.

[17]  Arturo de la Escalera,et al.  VBII-UAV: Vision-Based Infrastructure Inspection-UAV , 2017, WorldCIST.

[18]  Shinpei Kato,et al.  An Open Approach to Autonomous Vehicles , 2015, IEEE Micro.

[19]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[20]  Mehdi Bennis,et al.  Toward Interconnected Virtual Reality: Opportunities, Challenges, and Enablers , 2016, IEEE Communications Magazine.

[21]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[22]  Leslie Pack Kaelbling,et al.  All learning is Local: Multi-agent Learning in Global Reward Games , 2003, NIPS.

[23]  Christopher Stewart,et al.  Whole-Field Reinforcement Learning: A Fully Autonomous Aerial Scouting Method for Precision Agriculture , 2020, Sensors.

[24]  Odest Chadwicke Jenkins,et al.  SUM: Sequential scene understanding and manipulation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Thinh T. Doan,et al.  Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning , 2019, ICML.

[26]  Ranveer Chandra,et al.  FarmBeats: An IoT Platform for Data-Driven Agriculture , 2017, NSDI.

[27]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[28]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[29]  Megen de Bruin-Molé Autonomous , 2022, Industry, Innovation and Infrastructure.

[30]  Emiliano Casalicchio Container Orchestration: A Survey , 2019, Systems Modeling: Methodologies and Tools.

[31]  Sergey Levine,et al.  End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.

[32]  Joseph E. Gonzalez,et al.  Spatula: Efficient cross-camera video analytics on large camera networks , 2020, 2020 IEEE/ACM Symposium on Edge Computing (SEC).

[33]  Farbod Fahimi,et al.  Autonomous Robots: Modeling, Path Planning, and Control , 2008 .

[34]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[35]  Antonio Barrientos,et al.  Aerial remote sensing in agriculture: A practical approach to area coverage and path planning for fleets of mini aerial robots , 2011, J. Field Robotics.

[36]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[37]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[38]  Arumugam Nallanathan,et al.  Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks , 2018, IEEE Transactions on Wireless Communications.

[39]  Kameswari Chebrolu,et al.  Wake-on-WLAN , 2006, WWW '06.

[40]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[41]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[42]  Christopher Stewart,et al.  Adaptive autonomous UAV scouting for rice lodging assessment using edge computing with deep learning EDANet , 2020, Comput. Electron. Agric..

[43]  George A. Bekey,et al.  On autonomous robots , 1998, The Knowledge Engineering Review.

[44]  Kok-Lim Alvin Yau,et al.  Edge Computing in 5G: A Review , 2019, IEEE Access.

[45]  Martin Molina,et al.  AEROSTACK: An architecture and open-source software framework for aerial robotics , 2016, 2016 International Conference on Unmanned Aircraft Systems (ICUAS).

[46]  M. Hebert,et al.  Contextual classification with functional Max-Margin Markov Networks , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[48]  Christopher Stewart,et al.  Autonomic Computing Challenges in Fully Autonomous Precision Agriculture , 2019, 2019 IEEE International Conference on Autonomic Computing (ICAC).

[49]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[50]  Giovanni Montana,et al.  Multi-agent Deep Reinforcement Learning with Extremely Noisy Observations , 2018, ArXiv.

[51]  Roland Siegwart,et al.  Introduction to Autonomous Mobile Robots , 2004 .

[52]  J. Kangasharju,et al.  Pruning Edge Research with Latency Shears , 2020, HotNets.

[53]  Barry Porter,et al.  REX: A Development Platform and Online Learning Approach for Runtime Emergent Software Systems , 2016, OSDI.

[54]  Christina Delimitrou,et al.  HiveMind: A Scalable and Serverless Coordination Control Platform for UAV Swarms , 2020, ArXiv.

[55]  Junshan Zhang,et al.  A Collaborative Learning Framework via Federated Meta-Learning , 2020, ArXiv.

[56]  Tamer Basar,et al.  Networked Multi-Agent Reinforcement Learning in Continuous Spaces , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[57]  John P. Fulton,et al.  Integrating aerial images for in-season nitrogen management in a corn field , 2018, Comput. Electron. Agric..

[58]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[59]  Richard Socher,et al.  Multi-Hop Knowledge Graph Reasoning with Reward Shaping , 2018, EMNLP.

[60]  Biyi Fang,et al.  FlexDNN: Input-Adaptive On-Device Deep Learning for Efficient Mobile Vision , 2020, 2020 IEEE/ACM Symposium on Edge Computing (SEC).

[61]  Chunhua Zhang,et al.  The application of small unmanned aerial systems for precision agriculture: a review , 2012, Precision Agriculture.

[62]  Adam Wierman,et al.  Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems , 2019, L4DC.

[63]  J. Mockus,et al.  Bayesian approach to global optimization and application to multiobjective and constrained problems , 1991 .

[64]  Jayson G. Boubin,et al.  Managing edge resources for fully autonomous aerial systems , 2019, SEC.

[65]  Michal Pechoucek,et al.  Autonomous UAV Surveillance in Complex Urban Environments , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[66]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.

[67]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.