Sparse modeling using orthogonal forward regression with PRESS statistic and regularization

The paper introduces an efficient construction algorithm for obtaining sparse linear-in-the-weights regression models based on an approach of directly optimizing model generalization capability. This is achieved by utilizing the delete-1 cross validation concept and the associated leave-one-out test error also known as the predicted residual sums of squares (PRESS) statistic, without resorting to any other validation data set for model evaluation in the model construction process. Computational efficiency is ensured using an orthogonal forward regression, but the algorithm incrementally minimizes the PRESS statistic instead of the usual sum of the squared training errors. A local regularization method can naturally be incorporated into the model selection procedure to further enforce model sparsity. The proposed algorithm is fully automatic, and the user is not required to specify any criterion to terminate the model construction procedure. Comparisons with some of the existing state-of-art modeling methods are given, and several examples are included to demonstrate the ability of the proposed algorithm to effectively construct sparse models that generalize well.

[1]  Sheng Chen,et al.  Combined genetic algorithm optimization and regularized orthogonal least squares learning for radial basis function networks , 1999, IEEE Trans. Neural Networks.

[2]  Lars Kai Hansen,et al.  Linear unlearning for cross-validation , 1996, Adv. Comput. Math..

[3]  Sheng Chen,et al.  Regularized orthogonal least squares algorithm for constructing radial basis function networks , 1996 .

[4]  Xia Hong,et al.  A neurofuzzy network knowledge extraction and extended Gram-Schmidt algorithm for model subspace decomposition , 2003, IEEE Trans. Fuzzy Syst..

[5]  Sheng Chen,et al.  Representations of non-linear systems: the NARMAX model , 1989 .

[6]  Paul Sharkey,et al.  Automatic nonlinear predictive model-construction algorithm using forward regression and the PRESS statistic , 2003 .

[7]  I. J. Leontaritis,et al.  Model selection and validation methods for non-linear systems , 1987 .

[8]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[9]  Gérard Dreyfus,et al.  Withdrawing an example from the training set: An analytic estimation of its effect on a non-linear parameterised model , 2000, Neurocomputing.

[10]  Sheng Chen,et al.  Local regularization assisted orthogonal least squares regression , 2006, Neurocomputing.

[11]  Stephen A. Billings,et al.  The Determination of Multivariable Nonlinear Models for Dynamic Systems Using neural Networks , 1996 .

[12]  Gérard Dreyfus,et al.  Local Overfitting Control via Leverages , 2002, Neural Computation.

[13]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  H. Akaike A new look at the statistical model identification , 1974 .

[16]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[17]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[18]  K S Narendra,et al.  IDENTIFICATION AND CONTROL OF DYNAMIC SYSTEMS USING NEURAL NETWORKS , 1990 .

[19]  Sheng Chen,et al.  Recursive hybrid algorithm for non-linear system identification using radial basis function networks , 1992 .

[20]  Sheng Chen Nonlinear time series modelling and prediction using Gaussian RBF networks with enhanced clustering and RLS learning , 1995 .

[21]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[22]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[23]  Sheng Chen Locally regularised orthogonal least squares algorithm for the construction of sparse kernel regression models , 2002, 6th International Conference on Signal Processing, 2002..

[24]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[25]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[26]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[27]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[28]  T. Kavli ASMO—Dan algorithm for adaptive spline modelling of observation data , 1993 .

[29]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[30]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[31]  Sheng Chen,et al.  Extended model set, global data and threshold model identification of severely non-linear systems , 1989 .

[32]  S. A. Billings,et al.  The identification of linear and non-linear models of a turbocharged automotive diesel engine , 1989 .

[33]  Sheng Chen,et al.  Orthogonal least squares methods and their application to non-linear system identification , 1989 .

[34]  R. H. Myers Classical and modern regression with applications , 1986 .

[35]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[36]  S. Nonlinear time series modelling and prediction using Gaussian RBF networks with enhanced clustering and RLS learning , 2004 .

[37]  Xia Hong,et al.  Nonlinear model structure design and construction using orthogonal least squares and D-optimality design , 2002, IEEE Trans. Neural Networks.

[38]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[39]  Carlo H. Séquin,et al.  Optimal adaptive k-means algorithm with dynamic adjustment of learning rate , 1995, IEEE Trans. Neural Networks.

[40]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .