Efficient construction of sparse radial basis function neural networks using L1-regularization

This paper investigates the construction of sparse radial basis function neural networks (RBFNNs) for classification problems. An efficient two-phase construction algorithm (which is abbreviated as TPCLR1 for simplicity) is proposed by using L1 regularization. In the first phase, an improved maximum data coverage (IMDC) algorithm is presented for the initialization of RBF centers and widths. Then a specialized Orthant-Wise Limited-memory Quasi-Newton (sOWL-QN) method is employed to perform simultaneous network pruning and parameter optimization in the second phase. The advantages of TPCLR1 lie in that better generalization performance is guaranteed with higher model sparsity, and the required storage space and testing time are much reduced. Besides these, only the regularization parameter and the maximum number of function evaluations are required to be prescribed, then the entire construction procedure becomes automatic. The learning algorithm is verified by several classification benchmarks with different levels of complexity. The experimental results show that an appropriate value of the regularization parameter is easy to find without using costly cross validation, and the proposed TPCLR1 offers an efficient procedure to construct sparse RBFNN classifiers with good generalization performance.

[1]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[2]  Hao Yu,et al.  Neural Network Learning Without Backpropagation , 2010, IEEE Transactions on Neural Networks.

[3]  Qinghua Hu,et al.  Neighborhood based sample and feature selection for SVM classification learning , 2011, Neurocomputing.

[4]  Alessandro Artusi,et al.  Radial Basis Function Networks GPU-Based Implementation , 2008, IEEE Transactions on Neural Networks.

[5]  Lipo Wang,et al.  Data dimensionality reduction with application to simplifying RBF network structure and improving classification performance , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[7]  Concha Bielza,et al.  Learning an L1-Regularized Gaussian Bayesian Network in the Equivalence Class Space , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Cheng-Lin Liu,et al.  Handwritten digit recognition: benchmarking of state-of-the-art techniques , 2003, Pattern Recognit..

[9]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[10]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Sung-Kwun Oh,et al.  Optimized face recognition algorithm using radial basis function neural networks and its practical applications , 2015, Neural Networks.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Bing Lam Luk,et al.  Construction of Tunable Radial Basis Function Networks Using Orthogonal Forward Selection , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Modjtaba Rouhani,et al.  Two fast and accurate heuristic RBF learning rules for data classification , 2016, Neural Networks.

[16]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[17]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Anton van den Hengel,et al.  Fully corrective boosting with arbitrary loss and regularization , 2013, Neural Networks.

[19]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[20]  Minrui Fei,et al.  A multi-output two-stage locally regularized model construction method using the extreme learning machine , 2014, Neurocomputing.

[21]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[22]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[23]  Lutz Prechelt,et al.  Connection pruning with static and adaptive pruning schedules , 1997, Neurocomputing.

[24]  Friedhelm Schwenker,et al.  Three learning phases for radial-basis-function networks , 2001, Neural Networks.

[25]  George W. Irwin,et al.  A Novel Continuous Forward Algorithm for RBF Neural Modelling , 2007, IEEE Transactions on Automatic Control.

[26]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[27]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[28]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[29]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[30]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[31]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[32]  Sheng Chen,et al.  Local regularization assisted orthogonal least squares regression , 2006, Neurocomputing.

[33]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[34]  Yen-Jen Oyang,et al.  Data classification with radial basis function networks based on a novel kernel density estimation algorithm , 2005, IEEE Transactions on Neural Networks.

[35]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[36]  Mohamed Cheriet,et al.  Model selection for the LS-SVM. Application to handwriting recognition , 2009, Pattern Recognit..

[37]  Hadi Sadoghi Yazdi,et al.  Robust support vector machine-trained fuzzy system , 2014, Neural Networks.

[38]  Sundaram Suresh,et al.  Sequential Projection-Based Metacognitive Learning in a Radial Basis Function Network for Classification Problems , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Pedro Antonio Gutiérrez,et al.  Evolutionary q-Gaussian radial basis function neural networks for multiclassification , 2011, Neural Networks.

[40]  Swagatam Das,et al.  Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs , 2015, Neural Networks.

[41]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[42]  Mark W. Schmidt,et al.  Graphical model structure learning using L₁-regularization , 2010 .

[43]  George W. Irwin,et al.  Locally regularised two-stage learning algorithm for RBF network centre selection , 2012, Int. J. Syst. Sci..

[44]  HuQinghua,et al.  Neighborhood based sample and feature selection for SVM classification learning , 2011 .

[45]  I. Nabney Efficient training of RBF networks for classification , 1999 .

[46]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[47]  Qin Zhang,et al.  Large-scale linear nonparallel support vector machine solver , 2014, Neurocomputing.