Implicit regularization and momentum algorithms in nonlinear adaptive control and prediction

Stable concurrent learning and control of dynamical systems is the subject of adaptive control. Despite being an established field with many practical applications and a rich theory, much of the development in adaptive control for nonlinear systems revolves around a few key algorithms. By exploiting strong connections between classical adaptive nonlinear control techniques and recent progress in optimization and machine learning, we show that there exists considerable untapped potential in algorithm development for both adaptive nonlinear control and adaptive dynamics prediction. We first introduce first-order adaptation laws inspired by natural gradient descent and mirror descent. We prove that when the system is in the over-parameterized regime typical of modern machine learning or is not persistently excited, such non-Euclidean adaptation laws implicitly regularize the learned model. We apply this result to regularized dynamics predictor and observer design, and consider Hamiltonian systems, Lagrangian systems, and recurrent neural networks as concrete examples. We subsequently develop a variational formalism based on the Bregman Lagrangian to define adaptation laws with momentum applicable to linearly parameterized systems or nonlinearly parameterized systems satisfying monotonicity or convexity requirements. We show that the Euler Lagrange equations for the Bregman Lagrangian lead to natural gradient and mirror descent-like adaptation laws with momentum, and recover their first-order analogues in the infinite friction limit. We illustrate our analysis with simulations using a higher-order algorithm for nonlinearly parameterized systems to learn regularized hidden layer weights in a three-layer feedforward neural network.

[1]  Robert M. Sanner,et al.  Gaussian Networks for Direct Adaptive Control , 1991, 1991 American Control Conference.

[2]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[3]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[4]  Michael O'Neill,et al.  Behavior of Accelerated Gradient Methods Near Critical Points of Nonconvex Problems , 2017 .

[5]  Babak Hassibi,et al.  Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization , 2018, ICLR.

[6]  Ivan Tyukin,et al.  Adaptation and Parameter Estimation in Systems With Unstable Target Dynamics and Nonlinear Parametrization , 2005, IEEE Transactions on Automatic Control.

[7]  Yee Whye Teh,et al.  Hamiltonian Descent Methods , 2018, ArXiv.

[8]  Michael I. Jordan,et al.  Generalized Momentum-Based Methods: A Hamiltonian Perspective , 2019, SIAM J. Optim..

[9]  Anuradha M. Annaswamy,et al.  Adaptive control of nonlinearly parameterized systems with a triangular structure , 2002, Autom..

[10]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[11]  Anuradha M. Annaswamy,et al.  Adaptive control of continuous time systems with convex/concave parametrization , 1998, Autom..

[12]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[13]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[14]  Frank Chongwoo Park,et al.  A Natural Adaptive Control Law for Robot Manipulators , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  I. Yu. Tyukin Adaptation Algorithms in Finite form for Nonlinear Dynamic Objects , 2003 .

[16]  Patrick M. Wensing,et al.  Geometric Robot Dynamic Identification: A Convex Programming Approach , 2020, IEEE Transactions on Robotics.

[17]  Dylan J. Foster,et al.  Learning nonlinear dynamical systems from a single trajectory , 2020, L4DC.

[18]  Petros A. Ioannou,et al.  Robust Adaptive Control , 2012 .

[19]  Stephen J. Wright,et al.  Behavior of accelerated gradient methods near critical points of nonconvex functions , 2017, Math. Program..

[20]  Nicolas Tabareau,et al.  How Synchronization Protects from Noise , 2007, 0801.0011.

[21]  Anuradha M. Annaswamy,et al.  Adaptation in the presence of a general nonlinear parameterization: an error model approach , 1999, IEEE Trans. Autom. Control..

[22]  Alexander L. Fradkov,et al.  Nonlinear and Adaptive Control of Complex Systems , 1999 .

[23]  Jean-Jacques E. Slotine,et al.  Linear Matrix Inequalities for Physically Consistent Inertial Parameter Identification: A Statistical Perspective on the Mass Distribution , 2017, IEEE Robotics and Automation Letters.

[24]  Michael I. Jordan,et al.  Acceleration via Symplectic Discretization of High-Resolution Differential Equations , 2019, NeurIPS.

[25]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[26]  J. Slotine,et al.  On the Adaptive Control of Robot Manipulators , 1987 .

[27]  Adam Tauman Kalai,et al.  Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression , 2011, NIPS.

[28]  Nathan Srebro,et al.  Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.

[29]  Nathan Srebro,et al.  The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..

[30]  Michael A. Bolender,et al.  A Class of High Order Tuners for Adaptive Systems , 2021, IEEE Control Systems Letters.

[31]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[32]  Steven L. Brunton,et al.  Data-driven discovery of coordinates and governing equations , 2019, Proceedings of the National Academy of Sciences.

[33]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[34]  Anant Sahai,et al.  Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[35]  Babak Hassibi,et al.  Stochastic Mirror Descent on Overparameterized Nonlinear Models , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Alessandro Astolfi,et al.  Dynamic scaling and observer design with application to adaptive control , 2009, Autom..

[37]  Jean-Jacques E. Slotine,et al.  Contraction Metrics in Adaptive Nonlinear Control , 2019, ArXiv.

[38]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[39]  Anuradha M. Annaswamy,et al.  Provably Correct Learning Algorithms in the Presence of Time-Varying Features Using a Variational Perspective , 2019 .

[40]  Romeo Ortega,et al.  Immersion and invariance adaptive control of nonlinearly parameterized nonlinear systems , 2009, 2009 American Control Conference.

[41]  Michael I. Jordan,et al.  A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.

[42]  Yann LeCun,et al.  Deep learning with Elastic Averaging SGD , 2014, NIPS.

[43]  L. F. Abbott,et al.  Generating Coherent Patterns of Activity from Chaotic Neural Networks , 2009, Neuron.

[44]  Ivan Tyukin Adaptation in Dynamical Systems , 2011 .

[45]  Jean-Jacques E. Slotine,et al.  On Contraction Analysis for Non-linear Systems , 1998, Autom..

[46]  Philip M. Long,et al.  Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.

[47]  Jean-Jacques E. Slotine,et al.  Cooperative Adaptive Control for Cloud-Based Robotics , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Jean-Jacques E. Slotine,et al.  Beyond convexity—Contraction and global convergence of gradient descent , 2020, PloS one.

[49]  and Charles K. Taft Reswick,et al.  Introduction to Dynamic Systems , 1967 .

[50]  Jean-Jacques E. Slotine,et al.  Modular stability tools for distributed computation and control , 2003 .

[51]  Romeo Ortega,et al.  Parameter Estimation of Nonlinearly Parameterized Regressions without Overparameterization nor Persistent Excitation: Application to System Identification and Adaptive Control , 2019, 1910.08016.

[52]  Adam R. Klivans,et al.  Learning Neural Networks with Two Nonlinear Layers in Polynomial Time , 2017, COLT.

[53]  H. Attouch,et al.  THE HEAVY BALL WITH FRICTION METHOD, I. THE CONTINUOUS DYNAMICAL SYSTEM: GLOBAL EXPLORATION OF THE LOCAL MINIMA OF A REAL-VALUED FUNCTION BY ASYMPTOTIC ANALYSIS OF A DISSIPATIVE DYNAMICAL SYSTEM , 2000 .

[54]  Anuradha M. Annaswamy,et al.  Stable Adaptive Systems , 1989 .

[55]  Nathan Srebro,et al.  Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.

[56]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[57]  Adam R. Klivans,et al.  Learning Depth-Three Neural Networks in Polynomial Time , 2017, ArXiv.

[58]  Michael I. Jordan,et al.  A Dynamical Systems Perspective on Nesterov Acceleration , 2019, ICML.

[59]  Jean-Jacques E. Slotine,et al.  Methodological remarks on contraction theory , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[60]  Alessandro Astolfi,et al.  Immersion and invariance: a new tool for stabilization and adaptive control of nonlinear systems , 2001, IEEE Trans. Autom. Control..

[61]  Albert-László Barabási,et al.  Observability of complex systems , 2013, Proceedings of the National Academy of Sciences.

[62]  Jean-Jacques E. Slotine,et al.  Adaptive sliding controller synthesis for non-linear systems , 1986 .

[63]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[64]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[65]  Michael I. Jordan,et al.  On Symplectic Optimization , 2018, 1802.03653.

[66]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[67]  Jean-Jacques Slotine,et al.  Learning arbitrary dynamics in efficient, balanced spiking networks using local plasticity rules , 2017, AAAI 2017.

[68]  A. S. Morse,et al.  High-Order Parameter Tuners for the Adaptive Control of Linear and Nonlinear Systems , 1992 .

[69]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[70]  Bruno A. Olshausen,et al.  The Sparse Manifold Transform , 2018, NeurIPS.

[71]  Daniel P. Robinson,et al.  Conformal symplectic and relativistic optimization , 2019, NeurIPS.

[72]  Michael I. Jordan,et al.  Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[73]  Michael I. Jordan,et al.  Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives , 2020, J. Mach. Learn. Res..

[74]  Jean-Jacques E. Slotine,et al.  A Continuous-Time Analysis of Distributed Stochastic Gradient , 2018, Neural Computation.

[75]  Raghu Meka,et al.  Learning One Convolutional Layer with Overlapping Patches , 2018, ICML.

[76]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 2003, Machine Learning.