Simultaneous input variable and basis function selection for RBF networks

Input selection is advantageous in regression problems. It may, for example, decrease the training time of models, reduce measurement costs, and assist in circumventing problems of high dimensionality. Also, the inclusion of useless inputs into the model increases the likelihood of overfitting. Neural networks provide good generalization in many cases, but their interpretability is usually limited. However, selecting a subset of variables and estimating their relative importances would be valuable in many real world applications. In the present work, a simultaneous input and basis function selection method for a radial basis function (RBF) network is proposed. The selection is performed by minimizing a constrained optimization problem, in which sparsity of the network is controlled by two continuous valued shrinkage parameters. Each input dimension is weighted and the constraints are imposed on these weights and the output layer coefficients. Direct and alternating optimization (AO) procedures are presented to solve the problem. The proposed method is applied to simulated and benchmark data. In the comparison with the existing methods, the resulting RBF networks have similar prediction accuracies with the smaller numbers of inputs and basis functions.

[1]  Alan J. Miller,et al.  Subset Selection in Regression , 1991 .

[2]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[3]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[4]  Visakan Kadirkamanathan,et al.  A Function Estimation Approach to Sequential Learning with Neural Networks , 1993, Neural Computation.

[5]  Gavin C. Cawley,et al.  Reduced Rank Kernel Ridge Regression , 2002, Neural Processing Letters.

[6]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[7]  Johan A. K. Suykens,et al.  Weighted least squares support vector machines: robustness and sparse approximation , 2002, Neurocomputing.

[8]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  G. Baudat,et al.  Kernel-based methods and function approximation , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[11]  C. Zheng,et al.  ; 0 ; , 1951 .

[12]  Wright-Patterson Afb,et al.  Feature Selection Using a Multilayer Perceptron , 1990 .

[13]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[14]  Niels Kjølstad Poulsen,et al.  Neural Networks for Modelling and Control of Dynamic Systems: A Practitioner’s Handbook , 2000 .

[15]  Apostolos-Paul N. Refenes,et al.  Neural model identification, variable selection and model adequacy , 1999 .

[16]  Mark J. L. Orr,et al.  Regularization in the Selection of Radial Basis Function Centers , 1995, Neural Computation.

[17]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[18]  Y Lu,et al.  A Sequential Learning Scheme for Function Approximation Using Minimal Radial Basis Function Neural Networks , 1997, Neural Computation.

[19]  Patrick Gallinari,et al.  FEATURE SELECTION WITH NEURAL NETWORKS , 1999 .

[20]  J. Suykens,et al.  Automatic relevance determination for least squares support vector machine regression , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[21]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[22]  Ulrich Anders,et al.  Model selection in neural networks , 1999, Neural Networks.

[23]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[24]  Apostolos-Paul N. Refenes,et al.  Neural Model Identification , 1999 .

[25]  R. Howard,et al.  Local convergence analysis of a grouped variable version of coordinate descent , 1987 .

[26]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[27]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[28]  Mokhtar S. Bazaraa,et al.  Nonlinear Programming: Theory and Algorithms , 1993 .

[29]  James C. Bezdek,et al.  Some Notes on Alternating Optimization , 2002, AFSS.

[30]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[31]  J. Bezdek,et al.  Grouped coordinate minimization using Newton's method for inexact minimization in one vector coordinate , 1991 .

[32]  Kang Li,et al.  Neural input selection - A fast model-based approach , 2007, Neurocomputing.

[33]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[34]  Thomas Villmann,et al.  Comparison of relevance learning vector quantization with other metric adaptive classification methods , 2006, Neural Networks.

[35]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[36]  P. Anandan,et al.  Optimization in Model Matching and Perceptual Organization , 1989, Neural Computation.

[37]  Fabrice Rossi Attribute Suppression with Multi-Layer Perceptron , 1996 .

[38]  Michel Verleysen,et al.  On the Kernel Widths in Radial-Basis Function Networks , 2003, Neural Processing Letters.

[39]  Jaakko Hollmén,et al.  Sequential input selection algorithm for long-term prediction of time series , 2008, Neurocomputing.

[40]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[41]  Narasimhan Sundararajan,et al.  A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation , 2005, IEEE Transactions on Neural Networks.

[42]  Gavin C. Cawley,et al.  Fast exact leave-one-out cross-validation of sparse least-squares support vector machines , 2004, Neural Networks.

[43]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[44]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[45]  Jaakko Hollmén,et al.  Selection of important input variables for RBF network using partial derivatives , 2008, ESANN.

[46]  Jarkko Tikka Input Selection for Radial Basis Function Networks by Constrained Optimization , 2007, ICANN.

[47]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[48]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[49]  Alan J. Miller Subset Selection in Regression , 1992 .

[50]  Michel Verleysen,et al.  Resampling methods for parameter-free and robust feature selection with mutual information , 2007, Neurocomputing.

[51]  Nicolas Chapados,et al.  Input decay: simple and effective soft variable selection , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[52]  J. Mark Introduction to radial basis function networks , 1996 .

[53]  Timo Similä,et al.  Input selection and shrinkage in multiresponse linear regression , 2007, Comput. Stat. Data Anal..

[54]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[55]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.