A Bayesian approach to sparse dynamic network identification

Modeling and identification of high dimensional systems, involving signals with many components, poses severe challenges to off-the-shelf techniques for system identification. This is particularly so when relatively small data sets, as compared to the number signal components, have to be used. It is often the case that each component of the measured signal can be described in terms of a few other measured variables and these dependences can be encoded in a graphical way via so called ''Dynamic Bayesian Networks''. The problem of finding the interconnection structure as well as estimating the dynamic models can be posed as a system identification problem which involves variable selection. While this variable selection could be performed via standard selection techniques, computational complexity may however be a critical issue, being combinatorial in the number of inputs and outputs. In this paper we introduce two new nonparametric techniques which borrow ideas from a recently introduced kernel estimator called ''stable-spline'' as well as from sparsity inducing priors which use @?"1-type penalties. Numerical experiments regarding estimation of large scale sparse (ARMAX) models show that this technique provides a definite advantage over a group LAR algorithm and state-of-the-art parametric identification techniques based on prediction error minimization.

[1]  D. Materassi,et al.  On the problem of reconstructing an unknown topology , 2010, Proceedings of the 2010 American Control Conference.

[2]  Marc Timme,et al.  Revealing network connectivity from response dynamics. , 2006, Physical review letters.

[3]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[4]  Nicholas Rose,et al.  Highly Structured Stochastic Systems , 2005, Technometrics.

[5]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[6]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[7]  Sergey Bakin,et al.  Adaptive regression and model selection in data mining problems , 1999 .

[8]  M. Yuan,et al.  Dimension reduction and coefficient estimation in multivariate linear regression , 2007 .

[9]  Cun-Hui Zhang,et al.  Stepwise searching for feature variables in high-dimensional linear regression , 2008 .

[10]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[11]  Alessandro Chiuso,et al.  Prediction error identification of linear systems: A nonparametric Gaussian regression approach , 2011, Autom..

[12]  Donatello Materassi,et al.  Topological identification in networks of dynamical systems , 2008, 2008 47th IEEE Conference on Decision and Control.

[13]  Lieven Vandenberghe,et al.  Topology Selection in Graphical Models of Autoregressive Processes , 2010, J. Mach. Learn. Res..

[14]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[15]  Gianluigi Pillonetto,et al.  Solutions of nonlinear control and estimation problems in reproducing kernel Hilbert spaces: Existence and numerical determination , 2008, Autom..

[16]  G. Wahba Spline models for observational data , 1990 .

[17]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[18]  D. Brillinger Time series - data analysis and theory , 1981, Classics in applied mathematics.

[19]  Paul M. J. Van den Hof,et al.  Delay structure conditions for identifiability of closed loop systems , 1992, Autom..

[20]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[21]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[22]  Alessandro Chiuso,et al.  Regularized estimation of sums of exponentials in spaces generated by stable spline kernels , 2010, Proceedings of the 2010 American Control Conference.

[23]  Giuseppe De Nicolao,et al.  A new kernel-based approach for linear system identification , 2010, Autom..

[24]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[25]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[26]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[27]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[28]  L. Ljung,et al.  On the Estimation of Transfer Functions, Regularizations and Gaussian Processes – Revisited , 2011 .

[29]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[30]  T. Sauer,et al.  Reconstructing the topology of sparsely connected dynamical networks. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Trevor Hastie,et al.  Applications of the lasso and grouped lasso to the estimation of sparse graphical models , 2010 .

[32]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[33]  S. Weisberg Applied Linear Regression: Weisberg/Applied Linear Regression 3e , 2005 .

[34]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[35]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[36]  Peter Green,et al.  Highly Structured Stochastic Systems , 2003 .

[37]  Michel Gevers,et al.  On jointly stationary feedback-free stochastic processes , 1982 .

[38]  Alessandro Chiuso,et al.  Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors , 2010, NIPS.

[39]  Graham C. Goodwin,et al.  Estimated Transfer Functions with Application to Model Order Selection , 1992 .

[40]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[41]  Alessandro Chiuso,et al.  Nonparametric sparse estimators for identification of large scale linear systems , 2010, 49th IEEE Conference on Decision and Control (CDC).

[42]  Alessandro Chiuso,et al.  Convex vs nonconvex approaches for sparse estimation: GLasso, Multiple Kernel Learning and Hyperparameter GLasso , 2013, 1302.6434.

[43]  J. Griffin,et al.  Alternative prior distributions for variable selection with very many more variables than observations , 2005 .

[44]  Javad Mohammadpour,et al.  Efficient modeling and control of large-scale systems , 2010 .

[45]  D. Giannone,et al.  Large Bayesian VARs , 2008, SSRN Electronic Journal.

[46]  Hansheng Wang Forward Regression for Ultra-High Dimensional Variable Screening , 2009 .

[47]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[48]  Chih-Ling Tsai,et al.  Regression coefficient and autoregressive order shrinkage and selection via the lasso , 2007 .

[49]  Lennart Ljung,et al.  Closed-loop identification revisited , 1999, Autom..

[50]  Bhaskar D. Rao,et al.  Latent Variable Bayesian Models for Promoting Sparsity , 2011, IEEE Transactions on Information Theory.

[51]  Francesca P. Carli,et al.  Efficient algorithms for large scale linear system identification using stable spline estimators , 2012 .

[52]  Francesco Dinuzzo,et al.  Kernel machines with two layers and multiple kernel learning , 2010, ArXiv.

[53]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[54]  Olivier Bernard,et al.  Near optimal interval observers bundle for uncertain bioreactors , 2007, 2007 European Control Conference (ECC).

[55]  H. Leeb,et al.  Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator , 2007, 0704.1466.

[56]  Biao Huang,et al.  System Identification , 2000, Control Theory for Physicists.

[57]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[58]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[59]  M. West On scale mixtures of normal distributions , 1987 .

[60]  H. Akaike A new look at the statistical model identification , 1974 .

[61]  Alessandro Chiuso,et al.  On the Estimation of Hyperparameters for Empirical Bayes Estimators: Maximum Marginal Likelihood vs Minimum MSE , 2012 .

[62]  Malik Beshir Malik,et al.  Applied Linear Regression , 2005, Technometrics.

[63]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[64]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[65]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[66]  Donatello Materassi,et al.  Topological Identification in Networks of Dynamical Systems , 2008, IEEE Transactions on Automatic Control.

[67]  Te-Won Lee,et al.  On the multivariate Laplace distribution , 2006, IEEE Signal Processing Letters.

[68]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[69]  Alessandro Chiuso,et al.  Consistency analysis of some closed-loop subspace identification methods , 2005, Autom..

[70]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[71]  Bo Wahlberg,et al.  Graphical Models of Autoregressive Moving-Average Processes , 2010 .

[72]  Nan-Jung Hsu,et al.  Subset selection for vector autoregressive processes using Lasso , 2008, Comput. Stat. Data Anal..

[73]  J. Burke,et al.  On the MSE Properties of Empirical Bayes Methods for Sparse Estimation , 2012 .

[74]  J. S. Maritz,et al.  Empirical Bayes Methods , 1974 .