Sparse Methods for Automatic Relevance Determination

This work considers methods for imposing sparsity in Bayesian regression with applications in nonlinear system identification. We first review automatic relevance determination (ARD) and analytically demonstrate the need to additional regularization or thresholding to achieve sparse models. We then discuss two classes of methods, regularization based and thresholding based, which build on ARD to learn parsimonious solutions to linear problems. In the case of orthogonal covariates, we analytically demonstrate favorable performance with regards to learning a small set of active terms in a linear system with a sparse solution. Several example problems are presented to compare the set of proposed methods in terms of advantages and limitations to ARD in bases with hundreds of elements. The aim of this paper is to analyze and understand the assumptions that lead to several algorithms and to provide theoretical and empirical results so that the reader may gain insight and make more informed choices regarding sparse Bayesian regression.

[1]  David P. Wipf,et al.  A New View of Automatic Relevance Determination , 2007, NIPS.

[2]  E. Lorenz Predictability of Weather and Climate: Predictability – a problem partly solved , 2006 .

[3]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[4]  H. Schaeffer,et al.  Learning partial differential equations via data discovery and sparse optimization , 2017, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[5]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[6]  Kamran Shafi,et al.  Prediction of dynamical systems by symbolic regression. , 2016, Physical review. E.

[7]  Guang Lin,et al.  SubTSBR to tackle high noise and outliers for data-driven discovery of differential equations , 2019, J. Comput. Phys..

[8]  Bhaskar D. Rao,et al.  Sparse Bayesian learning for basis selection , 2004, IEEE Transactions on Signal Processing.

[9]  Maziar Raissi,et al.  Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations , 2018, J. Mach. Learn. Res..

[10]  Yibo Yang,et al.  Bayesian differential programming for robust systems identification under uncertainty , 2020, Proceedings of the Royal Society A.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Yi Li,et al.  Bayesian automatic relevance determination algorithms for classifying gene expression data. , 2002, Bioinformatics.

[13]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[14]  Steven L. Brunton,et al.  Data-Driven Identification of Parametric Partial Differential Equations , 2018, SIAM J. Appl. Dyn. Syst..

[15]  Aggelos K. Katsaggelos,et al.  Bayesian Compressive Sensing Using Laplace Priors , 2010, IEEE Transactions on Image Processing.

[16]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[17]  Vincent Y. F. Tan,et al.  Automatic Relevance Determination in Nonnegative Matrix Factorization with the /spl beta/-Divergence , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Guang Lin,et al.  Robust data-driven discovery of governing physical laws using a new subsampling-based sparse Bayesian method to tackle four challenges (large noise, outliers, data integration, and extrapolation) , 2019 .

[19]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[20]  Liang Li,et al.  Machine Discovery of Partial Differential Equations from Spatiotemporal Data , 2019, ArXiv.

[21]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[22]  H. Akaike A new look at the statistical model identification , 1974 .

[23]  Erik Bollt,et al.  How Entropic Regression Beats the Outliers Problem in Nonlinear System Identification , 2019, Chaos.

[24]  Guang Lin,et al.  Robust data-driven discovery of governing physical laws with error bars , 2018, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[25]  Giang Tran,et al.  Exact Recovery of Chaotic Systems from Highly Corrupted Data , 2016, Multiscale Model. Simul..

[26]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[27]  Jonathan D. Rogers,et al.  Causation Entropy Identifies Sparsity Structure for Parameter Estimation of Dynamic Systems , 2017 .

[28]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[29]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[30]  Ali Mohammad-Djafari,et al.  Bayesian Identification of Dynamical Systems , 2020, Proceedings.

[31]  S. Brunton,et al.  Discovering governing equations from data by sparse identification of nonlinear dynamical systems , 2015, Proceedings of the National Academy of Sciences.

[32]  Hod Lipson,et al.  Automated reverse engineering of nonlinear dynamical systems , 2007, Proceedings of the National Academy of Sciences.

[33]  David P. Wipf,et al.  Dual-Space Analysis of the Sparse Linear Model , 2012, NIPS.

[34]  Lloyd N. Trefethen,et al.  Fourth-Order Time-Stepping for Stiff PDEs , 2005, SIAM J. Sci. Comput..

[35]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[36]  J. Cavanaugh Unifying the derivations for the Akaike and corrected Akaike information criteria , 1997 .

[37]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[38]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[39]  Linan Zhang,et al.  On the Convergence of the SINDy Algorithm , 2018, Multiscale Model. Simul..

[40]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[41]  Keith Worden,et al.  Efficient parameter identification and model selection in nonlinear dynamical systems via sparse Bayesian learning , 2019 .

[42]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[43]  E. Lorenz Deterministic nonperiodic flow , 1963 .

[44]  James L. Beck,et al.  Bayesian Learning Using Automatic Relevance Determination Prior with an Application to Earthquake Early Warning , 2008 .