Learning Suboptimal Broadcasting Intervals in Multi-Agent Systems

Abstract In this paper, agents learn how often to exchange information with neighbors in cooperative Multi-Agent Systems (MASs) such that their user-defined cost functions are minimized. The investigated cost functions capture trade-offs between the MAS local control performance and energy consumption of each agent in the presence of exogenous disturbances. Agent energy consumption is critical for prolonging the MAS mission and is comprised of both control (e.g., acceleration, velocity) and communication efforts. The proposed methodology starts off by computing upper bounds on asynchronous broadcasting intervals that provably stabilize the MAS. Subsequently, we utilize these upper bounds as optimization constraints and employ an online learning algorithm based on Least Square Policy Iteration (LSPI) to minimize the cost function for each agent. Consequently, the obtained broadcasting intervals adapt to the most recent information (e.g., delayed and noisy agents’ inputs and/or outputs) received from neighbors and provably stabilize the MAS. Chebyshev polynomials are utilized as the approximator in the LSPI while Kalman Filtering (KF) handles sampled, corrupted and delayed data. The proposed methodology is exemplified in a consensus control problem with general linear agent dynamics.

[1]  Sandra Hirche,et al.  Cooperative suspended object manipulation using reinforcement learning and energy-based control , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Long Wang,et al.  Asynchronous Consensus in Continuous-Time Multi-Agent Systems With Switching Topology and Time-Varying Delays , 2006, IEEE Transactions on Automatic Control.

[3]  Lydia Tapia,et al.  A reinforcement learning approach towards autonomous suspended load manipulation using aerial robots , 2013, 2013 IEEE International Conference on Robotics and Automation.

[4]  Rafael Fierro,et al.  Optimal self-triggering for nonlinear systems via Approximate Dynamic Programming , 2012, 2012 IEEE International Conference on Control Applications.

[5]  Sandra Hirche,et al.  Stabilizing Transmission Intervals for Nonlinear Delayed Networked Control Systems , 2017, IEEE Transactions on Automatic Control.

[6]  B. Anderson,et al.  Detectability and Stabilizability of Time-Varying Discrete-Time Linear Systems , 1981 .

[7]  Yang Liu,et al.  H ∞ consensus control for multi-agent systems with linear coupling dynamics and communication delays , 2012, Int. J. Syst. Sci..

[8]  Vedran Bilas,et al.  Resource Management in Cooperative Multi-Agent Networks Through Self-Triggering , 2015 .

[9]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[10]  Harold L. Alexander,et al.  State estimation for distributed systems with sensing delay , 1991, Defense, Security, and Sensing.

[11]  Panos J. Antsaklis,et al.  On communication requirements for multi-agent consensus seeking , 2006 .

[12]  Warren E. Dixon,et al.  Concurrent learning-based network synchronization , 2014, 2014 American Control Conference.

[13]  Richard M. Murray,et al.  Consensus problems in networks of agents with switching topology and time-delays , 2004, IEEE Transactions on Automatic Control.

[14]  Warren E. Dixon,et al.  Approximate optimal cooperative decentralized control for consensus in a topological network of agents with uncertain nonlinear dynamics , 2013, 2013 American Control Conference.

[15]  Stjepan Bogdan,et al.  Decentralized Cooperative Control in Degraded Communication Environments , 2016 .