An experimental study on methods for the selection of basis functions in regression

A comparative study is carried out in the problem of selecting a subset of basis functions in regression tasks. The emphasis is put on practical requirements, such as the sparsity of the solution or the computational effort. A distinction is made according to the implicit or explicit nature of the selection process. In explicit selection methods the basis functions are selected from a set of candidates with a search process. In implicit methods a model with all the basis functions is considered and the model parameters are computed in such a way that several of them become zero. The former methods have the advantage that both the sparsity and the computational effort can be controlled. We build on earlier work on Bayesian interpolation to design efficient methods for explicit selection guided by model evidence, since there is strong indication that the evidence prefers simple models that generalize fairly well. Our experimental results indicate that very similar results between implicit and explicit methods can be obtained regarding generalization performance. However, they make use of different numbers of basis functions and are obtained at very different computational costs. It is also reported that the models with the highest evidence are not necessarily those with the best generalization performance.

[1]  King-Sun Fu,et al.  Handbook of pattern recognition and image processing , 1986 .

[2]  David J. C. MacKay,et al.  Comparison of Approximate Methods for Handling Hyperparameters , 1999, Neural Computation.

[3]  S. Sathiya Keerthi,et al.  Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[4]  Joaquin Quiñonero-Candela,et al.  Learning with Uncertainty: Gaussian Processes and Relevance Vector Machines , 2004 .

[5]  J. Kittler Feature selection and extraction , 1978 .

[6]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[7]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[8]  Carl E. Rasmussen,et al.  Healing the relevance vector machine through augmentation , 2005, ICML.

[9]  Pavel Pudil,et al.  Oscillating search algorithms for feature selection , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[10]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[11]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[12]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[13]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[14]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[15]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[16]  Lars Kai Hansen,et al.  Learning with Uncertainty - Gaussian Processes and Relevance Vector Machines , 2004 .

[17]  Nicolás García-Pedrajas,et al.  Nonlinear Boosting Projections for Ensemble Construction , 2007, J. Mach. Learn. Res..

[18]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[19]  Enrique Romero Merino,et al.  A sequential algorithm for feed-forward neural networks with optimal coefficients and interacting frequencies , 2005 .

[20]  Sheng Chen,et al.  Regularized orthogonal least squares algorithm for constructing radial basis function networks , 1996 .

[21]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[22]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[23]  Carl E. Rasmussen,et al.  Occam's Razor , 2000, NIPS.

[24]  CentresMark,et al.  Regularisation in the Selection of Radial Basis Function , 1995 .

[25]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[26]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[28]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[29]  Bernhard Schölkopf,et al.  On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion , 1998, Algorithmica.

[30]  J. Friedman,et al.  PROJECTION PURSUIT DENSITY ESTIMATION , 1984 .