Distributed adaptive dynamic programming for data-driven optimal control

Abstract Adaptive dynamic programming (ADP), as an important optimal control technique, can be exploited in the setting of data-driven control based on an approximate regression-based solution of the Hamilton–Jacobi–Bellman (HJB) equations. Distributed optimization algorithms, which are extensively studied in statistics and machine learning, have not yet been applied to the solution of data-driven ADP problems. In this work, we identify the data-driven ADP problem as a consensus optimization problem for nonlinear affine systems, and apply the alternating direction method of multipliers (ADMM) and its accelerated variants for its solution. For the input-constrained optimal control problem, we define a combined optimal primal–dual function to develop a data-based version of the input-constrained HJB equation.

[1]  Carlo Novara,et al.  Data-Driven Inversion-Based Control of Nonlinear Systems With Guaranteed Closed-Loop Stability , 2018, IEEE Transactions on Automatic Control.

[2]  Qinglai Wei,et al.  A Novel Iterative $\theta $-Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Automation Science and Engineering.

[3]  George N. Saridis,et al.  An Approximation Theory of Optimal Control for Trainable Manipulators , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  R. Bellman Dynamic programming. , 1957, Science.

[5]  Yang Xiong,et al.  Adaptive Dynamic Programming with Applications in Optimal Control , 2017 .

[6]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[7]  Maarten Steinbuch,et al.  Direct data-driven recursive controller unfalsification with analytic update , 2007, Autom..

[8]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[9]  Bingsheng He,et al.  On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..

[10]  Xiong Yang,et al.  Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints , 2014, Int. J. Control.

[11]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[12]  Jay H. Lee,et al.  Approximate dynamic programming based approach to process control and scheduling , 2006, Comput. Chem. Eng..

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Claudia Leopold,et al.  Parallel and Distributed Computing: A Survey of Models, Paradigms and Approaches , 2008 .

[15]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[16]  Shengyuan Xu,et al.  Inexact dual averaging method for distributed multi-agent optimization , 2014, Syst. Control. Lett..

[17]  Shangtai Jin,et al.  A unified data-driven design framework of optimality-based generalized iterative learning control , 2015, Comput. Chem. Eng..

[18]  Biao Huang,et al.  Dynamic Modeling, Predictive Control and Performance Monitoring: A Data-driven Subspace Approach , 2008 .

[19]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[20]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[21]  Zhuo Wang,et al.  From model-based control to data-driven control: Survey, classification and perspective , 2013, Inf. Sci..

[22]  Yunmei Chen,et al.  An Accelerated Linearized Alternating Direction Method of Multipliers , 2014, SIAM J. Imaging Sci..

[23]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Huaguang Zhang,et al.  Neural-Network-Based Constrained Optimal Control Scheme for Discrete-Time Switched Nonlinear System Using Dual Heuristic Programming , 2014, IEEE Transactions on Automation Science and Engineering.

[25]  Yongqiang Li,et al.  Data-driven asymptotic stabilization for discrete-time nonlinear systems , 2014, Syst. Control. Lett..

[26]  S. Joe Qin,et al.  Process data analytics in the era of big data , 2014 .

[27]  Huaguang Zhang,et al.  Leader-Based Optimal Coordination Control for the Consensus Problem of Multiagent Differential Games via Fuzzy Adaptive Dynamic Programming , 2015, IEEE Transactions on Fuzzy Systems.

[28]  Zhong-Ping Jiang,et al.  Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems , 2013, IEEE Transactions on Automatic Control.

[29]  J. E. Bailey,et al.  Nonlinear feedback control for operating a nonisothermal CSTR near an unstable steady state , 1977 .

[30]  Sergio M. Savaresi,et al.  Direct nonlinear control design: the virtual reference feedback tuning (VRFT) approach , 2006, IEEE Transactions on Automatic Control.

[31]  Lorenzo Fagiano,et al.  Data-driven control of nonlinear systems: An on-line direct approach , 2017, Autom..

[32]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[33]  J H Lee,et al.  Approximate dynamic programming approach for process control , 2010, ICCAS 2010.

[34]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[35]  Wotao Yin,et al.  Parallel Multi-Block ADMM with o(1 / k) Convergence , 2013, Journal of Scientific Computing.

[36]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[37]  Konstantina Christakopoulou,et al.  Accelerated Alternating Direction Method of Multipliers , 2015, KDD.

[38]  Yangyang Xu,et al.  Accelerated First-Order Primal-Dual Proximal Methods for Linearly Constrained Composite Convex Programming , 2016, SIAM J. Optim..

[39]  Raghunathan Rengaswamy,et al.  A review of process fault detection and diagnosis: Part III: Process history based methods , 2003, Comput. Chem. Eng..

[40]  Okyay Kaynak,et al.  Big Data for Modern Industry: Challenges and Trends [Point of View] , 2015, Proc. IEEE.

[41]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[42]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[43]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[44]  Michael Baldea,et al.  Dynamics and Nonlinear Control of Integrated Process Systems , 2012 .

[45]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[46]  Shiqian Ma,et al.  On the Global Linear Convergence of the ADMM with MultiBlock Variables , 2014, SIAM J. Optim..

[47]  Haibo He,et al.  Data-Driven Tracking Control With Adaptive Dynamic Programming for a Class of Continuous-Time Nonlinear Systems , 2017, IEEE Transactions on Cybernetics.