Distributed Experiment Design and Control for Multi-agent Systems with Gaussian Processes

This paper focuses on distributed learning-based control of decentralized multi-agent systems where the agents’ dynamics are modeled by Gaussian Processes (GPs). Two fundamental problems are considered: the optimal design of experiment for concurrent learning of the agents’ GP models, and the distributed coordination given the learned models. Using a Distributed Model Predictive Control (DMPC) approach, the two problems are formulated as distributed optimization problems, where each agent’s sub-problem includes both local and shared objectives and constraints. To solve the resulting complex and non-convex DMPC problems efficiently, we develop an algorithm called Alternating Direction Method of Multipliers with Convexification (ADMM-C) that combines a distributed ADMM algorithm and a Sequential Convexification method. The computational efficiency of our proposed method comes from the facts that the computation for solving the DMPC problem is distributed to all agents and that efficient convex optimization solvers are used at the agents for solving the convexified sub-problems. We also prove that, under some technical assumptions, the ADMM-C algorithm converges to a stationary point of the penalized optimization problem. The effectiveness of our approach is demonstrated in numerical simulations of a multi-vehicle formation control example. I. INRODUCTION Multi-agent control systems have been studied extensively in recent decades due to their increasing number of applications such as building energy networks, smart grids, robotic swarms, and wireless sensor networks. The majority of control methods designed for single systems cannot be easily extended to multi-agent control systems due to additional challenges such as the combination of global and local tasks, limited communication and computation capabilities, and privacy requirements that limit information sharing between agents [1]. While the centralized approach where a coordinator is available to coordinate and manipulate all agents, either with distributed computation or not, facilitates the communication and information sharing between agents, it does not scale reasonably with a large number of agents due to physical constraints such as short communication ranges, or the limited number of connections to the coordinator. For this reason, recent studies have been focused on decentralized multi-agent control systems, in which the coordinator is eliminated and each agent in the network can communicate and collaborate with a few other agents, called neighbors, to achieve the desired control objectives. Among various control methods for single dynamical systems, Model Predictive Control (MPC) is an advanced control technique that has been widely adapted to multiagent systems due to its flexibility and efficiency in handling multiple control objectives and constraints. The extension of MPC for multi-agent systems is widely known as Distributed MPC (DMPC) [2]. To solve a DMPC problem in a distributed manner, distributed optimization algorithms are commonly used. In [3], the authors consider an optimization control problem of flight formation and develop an algorithm to solve it based on dual decomposition techniques. In [4], the Alternating Direction Method of Multipliers (ADMM) was utilized for solving a DMPC problem. In [5], the authors provided a computational study on the performance of two distributed optimization algorithms, the dual decomposition based on fast gradient updates (DDFG) and the ADMM, for DMPC problems. Some other distributed optimization algorithms used for DMPC are fast alternating minimization algorithm (FAMA) and inexact FAMA [6], inexact Proximal Gradient Method and its accelerated variant [7]. In terms of applications, DMPC has been applied for numerous practical multi-agent systems such as robotic swarms [8], [9], and building energy networks [10], [11]. In the above works, the dynamics of all agents are assumed to be available and sufficiently precise. However, for many complex dynamical systems, accurately modeling the system dynamics based on physics is often not straightforward due to the existence of uncertainties and ignored dynamical parts. This challenge motivated us to develop learning-based DMPC for multi-agent systems in our previous paper [12], where Gaussian Processes (GPs) [13] were employed to learn the agent non-linear dynamics resulting in a GPbased DMPC (GP-DMPC) problem. To solve the GP-DMPC problem, a distributed optimization algorithm, called linGPSCP-ADMM, was developed to solve the GP-DMPC problem efficiently. However, in [12], we assumed that the GP dynamics of all agents are identical and available, which may not hold in practical applications since each agent has its own dynamics or system parameters. The problem pertaining to how to obtain training datasets for all agents in one experiment was thus not addressed. Moreover, the convergence properties of the linGP-SCP-ADMM algorithm was not analyzed in our work. Therefore, in this paper, we formulate a GP-DMPC problem that covers two fundamental problems of learning-based control for decentralized multi-agent systems, namely experiment design and coordination problems. In the experiment design problem, we utilize the receding horizon active learning approach [14] with exact conditional differential entropy to include individual learning objectives into the DMPC problem. To solve the non-convex and complex GP-DMPC problem, we develop a new algorithm called ADMM with Convexification (ADMM-C) that combine the distributed ADMM optimization method and Sequential Convexification Programming (SCP) technique [15]. Note that the ADMMC is different from the linGP-SCP-ADMM presented in our previous work [12]. In linGP-SCP-ADMM, at each iteration, we used the linearized Gaussian Process (linGP) [16] and SCP method to form a convex GP-DMPC subproblem that can be solved by convex distributed ADMM algorithm [17], but this method is not applicable for the problem considered in this paper, where the active learning objective is included. Meanwhile, the ADMM-C in this paper is a variant of the ADMM algorithm for non-convex and non-smooth optimization [18] where the convexification technique is used to solve the non-convex local subproblems at each ADMM iteration. In addition, the linGP-SCP-ADMM algorithm was dedicated for the multi-agent system with a coordinator, whereas ADMM-C in this paper is designed for decentralized systems. Under some technical assumptions, we prove that the ADMM-C algorithm converges to a stationary point of the penalized GP-DMPC problem. The effectiveness of our algorithm is demonstrated in a simulation case study of experiment design and formation control problem for a multivehicle system. The remainder of this paper is organized as follows. The GP-DMPC formulation for distributed experiment design and coordination of a multi-agent system is introduced in Section II. Our proposed ADMM-C algorithm is presented in Section III and the simulation results are reported and discussed in Section IV. Finally, Section V concludes the paper with a summary and some future directions. II. PROBLEM FORMULATION This section introduces a Gaussian Process-based Distributed Model Predictive Control (GP-DMPC) formulation for distributed experiment design and control problems of a multi-agent system, in which Gaussian Processes (GPs) are employed to represent the agent dynamics. Our problem formulation covers two fundamental problems: (1) the multiagent experiment design problem based on active learning where each agent explores the state-space to collect informative data for system identification while guaranteeing certain cooperative objectives and constraints with other agents, and (2) the multi-agent coordination problem in which the agents cooperate to achieve both local and shared objectives using the obtained GP dynamic models. Consider a decentralized multi-agent control system involving M dynamical agents. We assume bidirectional communication between the agents, i.e., if agent i can communicate with agent j, then agent j can communicate with agent i. Consequently, the communication between agents in this network is described by an undirected graph G = (V , E) where V = {1, 2, . . . ,M} is the vertex set representing the agents, and E is the edge set defining the connections between pairs of agents, i.e., (i, j) ∈ E means that agents i and j are neighbors. Moreover, we define Ni = {j|(i, j) ∈ E} as the set of agent i’s neighbors (we assume that i ∈ Ni) and the number of elements in the set Ni is denoted by |Ni|. For every agent i, we define its vector of control inputs as ui ∈ Ru,i , its vector of GP output variables as yi ∈ Ry,i , and its vector of non-GP variables as zi ∈ Rz,i . For any variable i of agent i, where is y, z, or u, let i,k denote its value at time step k. The GP dynamics of agent i express yi as yi,k ∼ Gi(xi,k; mi, ki), where Gi(·; mi, ki) is a GP with mean function mi and covariance function ki. The input vector xi,k of the GP is formed from current and past values of the control inputs ui,τ and non-GP states zi,τ , for τ ≤ k, as well as from past GP outputs yi,τ , for τ < k. Given an input xi,k , let ȳi,k = μi(xi,k) denote the predicted mean of the GP model Gi(·; mi, ki) at xi,k . Note that in this paper, we only utilize the GP means without uncertainty propagation to represent the predicted values of the nonlinear dynamics. More details on GP regression for dynamics and control can be found in [19], [20]. Let H > 0 be the length of the MPC horizon, t be the current time step and It = {t, . . . , t + H − 1} be the set of all time steps in the MPC horizon at time step t. Denote Ȳi,t = {ȳi,k|k ∈ It}, Zi,t = {zi,k|k ∈ It}, Ui,t = {ui,k|k ∈ It}, and Xi,t = {xi,k|k ∈ It} as the sets collecting the predicted GP output means, the non-GP states, the control inputs, and the GP inputs of agent i over the MPC horizon. For each agent i, we define the concate

[1]  Viet-Anh Le,et al.  Gaussian Process Based Distributed Model Predictive Control for Multi-agent Systems using Sequential Convex Programming and ADMM , 2020, 2020 IEEE Conference on Control Technology and Applications (CCTA).

[2]  Manfred Morari,et al.  Learning and Control Using Gaussian Processes , 2018, 2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems (ICCPS).

[3]  Manfred Morari,et al.  Computational aspects of distributed optimization in model predictive control , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[4]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[5]  Wenwu Yu,et al.  An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination , 2012, IEEE Transactions on Industrial Informatics.

[6]  Melanie Nicole Zeilinger,et al.  Inexact fast alternating minimization algorithm for distributed model predictive control , 2014, 53rd IEEE Conference on Decision and Control.

[7]  Sandra Hirche,et al.  Localized active learning of Gaussian process state space models , 2020, L4DC.

[8]  Benar Fux Svaiter,et al.  Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[9]  Goele Pipeleers,et al.  Distributed MPC for multi-vehicle systems moving in formation , 2017, Robotics Auton. Syst..

[10]  Wotao Yin,et al.  Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[11]  Truong Nghiem,et al.  Linearized Gaussian Processes for Fast Data-driven Model Predictive Control , 2018, 2019 American Control Conference (ACC).

[12]  Francesco Borrelli,et al.  Kinematic and dynamic vehicle models for autonomous driving control design , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[13]  Dana Kulic,et al.  Stable Gaussian Process based Tracking Control of Euler-Lagrange Systems , 2018, Autom..

[14]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[15]  J. M. Maestre,et al.  Distributed Model Predictive Control: An Overview and Roadmap of Future Research Opportunities , 2014, IEEE Control Systems.

[16]  Melanie Nicole Zeilinger,et al.  Quantization design for unconstrained distributed optimization , 2015, 2015 American Control Conference (ACC).

[17]  John Lygeros,et al.  Distributed model predictive consensus via the Alternating Direction Method of Multipliers , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[18]  James E. Braun,et al.  Distributed model predictive control via Proximal Jacobian ADMM for building control applications , 2017, 2017 American Control Conference (ACC).

[19]  Stephen P. Boyd,et al.  Distributed optimization for cooperative agents: application to formation flight , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[20]  Angela P. Schoellig,et al.  Online Trajectory Generation With Distributed Model Predictive Control for Multi-Robot Motion Planning , 2020, IEEE Robotics and Automation Letters.

[21]  Sebastian Trimpe,et al.  Actively Learning Gaussian Process Dynamics , 2019, L4DC.

[22]  Viet-Anh Le,et al.  A Receding Horizon Approach for Simultaneous Active Learning and Control using Gaussian Processes , 2021, 2021 IEEE Conference on Control Technology and Applications (CCTA).

[23]  Francesco Borrelli,et al.  A distributed predictive control approach to building temperature regulation , 2011, Proceedings of the 2011 American Control Conference.