Shallow Representation is Deep: Learning Uncertainty-aware and Worst-case Random Feature Dynamics

Random features is a powerful universal function approximator that inherits the theoretical rigor of kernel methods and can scale up to modern learning tasks. This paper views uncertain system models as unknown or uncertain smooth functions in universal reproducing kernel Hilbert spaces. By directly approximating the one-step dynamics function using random features with uncertain parameters, which are equivalent to a shallow Bayesian neural network, we then view the whole dynamical system as a multi-layer neural network. Exploiting the structure of Hamiltonian dynamics, we show that finding worst-case dynamics realizations using Pontryagin’s minimum principle is equivalent to performing the Frank-Wolfe algorithm on the deep net. Various numerical experiments on dynamics learning showcase the capacity of our modeling methodology.

[1]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[2]  Bernhard Schölkopf,et al.  Randomized Nonlinear Component Analysis , 2014, ICML.

[3]  Lorenzo Rosasco,et al.  Learning with SGD and Random Features , 2018, NeurIPS.

[4]  Long Chen,et al.  Maximum Principle Based Algorithms for Deep Learning , 2017, J. Mach. Learn. Res..

[5]  Thomas Lew,et al.  Sampling-based Reachability Analysis: A Random Set Theory Approach with Adversarial Sampling , 2020, ArXiv.

[6]  Lukas Hewing,et al.  On Simulation and Trajectory Prediction with Gaussian Process Dynamics , 2020, L4DC.

[7]  Stephen Piche,et al.  Nonlinear model predictive control using neural networks , 2000 .

[8]  Zhu Li,et al.  Towards a Unified Analysis of Random Fourier Features , 2018, ICML.

[9]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[10]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[11]  Mikhail Belkin,et al.  Approximation beats concentration? An approximation view on inference with smooth radial kernels , 2018, COLT.

[12]  Byron Boots,et al.  Prediction under Uncertainty in Sparse Spectrum Gaussian Processes with Applications to Filtering and Control , 2017, ICML.

[13]  Christopher K. I. Williams,et al.  Gaussian regression and optimal finite dimensional linear models , 1997 .

[14]  Lei Wu,et al.  Machine Learning from a Continuous Viewpoint , 2019, ArXiv.

[15]  Dino Sejdinovic,et al.  Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences , 2018, ArXiv.

[16]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[17]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[18]  Lorenzo Rosasco,et al.  Generalization Properties of Learning with Random Features , 2016, NIPS.

[19]  K SriperumbudurBharath,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2011 .

[20]  Andrea Montanari,et al.  The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.

[21]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[22]  Marc Peter Deisenroth,et al.  Efficiently sampling functions from Gaussian process posteriors , 2020, ICML.

[23]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[24]  Mikhail Belkin,et al.  To understand deep learning we need to understand kernel learning , 2018, ICML.

[25]  Andrea Carron,et al.  Meta Learning MPC using Finite-Dimensional Gaussian Process Approximations , 2020, ArXiv.

[26]  Mario Zanon,et al.  Estimation of uncertain ARX models with ellipsoidal parameter variability , 2015, 2015 European Control Conference (ECC).

[27]  Juraj Kabzan,et al.  Cautious Model Predictive Control Using Gaussian Process Regression , 2017, IEEE Transactions on Control Systems Technology.

[28]  Xiaolin Huang,et al.  Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Andrea Montanari,et al.  Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.

[30]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[31]  Yinyu Ye,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[32]  Ambuj Tewari,et al.  On the Approximation Properties of Random ReLU Features , 2018 .

[33]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[34]  Torsten Koller,et al.  Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[35]  Marc Peter Deisenroth,et al.  Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.

[36]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[37]  Le Song,et al.  Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[38]  Moritz Diehl,et al.  An approximation technique for robust nonlinear optimization , 2006, Math. Program..

[39]  Bernhard Schölkopf,et al.  Kernel Distributionally Robust Optimization: Generalized Duality Theorem and Stochastic Approximation , 2021, AISTATS.

[40]  Bin Dong,et al.  You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle , 2019, NeurIPS.