Local radial basis function network regressor with feature importance optimization

Recent big data analysis usually involves datasets with features collected from various sources, where each feature may have different importance, and the training datasets may not be uniformly sampled. To improve the prediction quality of realworld learning problems, we propose a local radial basis function network that is capable of handling both nonuniform sampling density and heterogeneous features. Nonuniform sampling is resolved by estimating local sampling density and adjust the width of the Gaussian kernels accordingly, and heterogeneous features are handled by scaling each dimension of the feature space asymmetrically. To make the learner aware of inter-feature relationship, we propose a feature importance optimization technique base on L-BFGS-B algorithm, using the leave-one-out cross-validation mean squared error as the objective function. Leave-one-out cross-validation used to be a very time consuming process, but the optimization has been made practical by the fast cross-validation capability of local RBFN. Our experiments show that when both nonuniform sampling density and interfeature relationship are properly handled, a simple RBFN can outperform more complex kernel-based learning models such as support vector regressor on both mean-squared-error and training speed.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  Enrico Blanzieri,et al.  Fast and Scalable Local Kernel Machines , 2010, J. Mach. Learn. Res..

[3]  Scott Kirkpatrick,et al.  Optimization by simulated annealing: Quantitative studies , 1984 .

[4]  Darby Tien-Hao Chang,et al.  Using a kernel density estimation based classifier to predict species-specific microRNA precursors , 2008, BMC Bioinformatics.

[5]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[6]  S. Sathiya Keerthi,et al.  Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[7]  Sundaram Suresh,et al.  Meta-cognitive RBF Network and its Projection Based Learning algorithm for classification problems , 2013, Appl. Soft Comput..

[8]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[10]  F. Melgani,et al.  An Adaptive SVM Nearest Neighbor Classifier for Remotely Sensed Imagery , 2006, 2006 IEEE International Symposium on Geoscience and Remote Sensing.

[11]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[12]  Jorge Nocedal,et al.  Remark on “algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound constrained optimization” , 2011, TOMS.

[13]  D. W. Scott,et al.  Variable Kernel Density Estimation , 1992 .

[14]  Gary William Flake,et al.  Efficient SVM Regression Training with SMO , 2002, Machine Learning.

[15]  T. O. Nelson Metamemory: A Theoretical Framework and New Findings , 1990 .

[16]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[17]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[22]  Yen-Jen Oyang,et al.  Data classification with radial basis function networks based on a novel kernel density estimation algorithm , 2005, IEEE Transactions on Neural Networks.

[23]  Geoffrey E. Hinton,et al.  The delve manual , 1996 .

[24]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[25]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[26]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[27]  J. Friedman Stochastic gradient boosting , 2002 .

[28]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[29]  Longbing Cao,et al.  Non-IIDness Learning in Behavioral and Social Data , 2014, Comput. J..

[30]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .