A radial basis function network classifier to maximise leave-one-out mutual information

We develop an orthogonal forward selection (OFS) approach to construct radial basis function (RBF) network classifiers for two-class problems. Our approach integrates several concepts in probabilistic modelling, including cross validation, mutual information and Bayesian hyperparameter fitting. At each stage of the OFS procedure, one model term is selected by maximising the leave-one-out mutual information (LOOMI) between the classifier's predicted class labels and the true class labels. We derive the formula of LOOMI within the OFS framework so that the LOOMI can be evaluated efficiently for model term selection. Furthermore, a Bayesian procedure of hyperparameter fitting is also integrated into the each stage of the OFS to infer the l^2-norm based local regularisation parameter from the data. Since each forward stage is effectively fitting of a one-variable model, this task is very fast. The classifier construction procedure is automatically terminated without the need of using additional stopping criterion to yield very sparse RBF classifiers with excellent classification generalisation performance, which is particular useful for the noisy data sets with highly overlapping class distribution. A number of benchmark examples are employed to demonstrate the effectiveness of our proposed approach.

[1]  Sheng Chen,et al.  Combined genetic algorithm optimization and regularized orthogonal least squares learning for radial basis function networks , 1999, IEEE Trans. Neural Networks.

[2]  Chris J. Harris,et al.  Neurofuzzy design and model construction of nonlinear dynamical processes from data , 2001 .

[3]  M. Korenberg Identifying nonlinear difference equation and functional expansion representations: The fast orthogonal algorithm , 2006, Annals of Biomedical Engineering.

[4]  Michel Verleysen,et al.  Mutual information for the selection of relevant variables in spectrometric nonlinear modelling , 2006, ArXiv.

[5]  David Rees,et al.  Nonlinear gas turbine modeling using NARMAX structures , 2001, IEEE Trans. Instrum. Meas..

[6]  C J Harris,et al.  Sparse Kernel Regression Modelling using combined locally regularised orthogonal least squares and D-Optimality , 2003 .

[7]  B. Mutnury,et al.  Macromodeling of nonlinear digital I/O drivers , 2006, IEEE Transactions on Advanced Packaging.

[8]  L X Wang,et al.  Fuzzy basis functions, universal approximation, and orthogonal least-squares learning , 1992, IEEE Trans. Neural Networks.

[9]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[10]  Sheng Chen,et al.  Sparse kernel regression modeling using combined locally regularized orthogonal least squares and D-optimality experimental design , 2003, IEEE Trans. Autom. Control..

[11]  S. A. Billings,et al.  The wavelet-NARMAX representation: A hybrid model structure combining polynomial models with multiresolution wavelet decompositions , 2005, Int. J. Syst. Sci..

[12]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[13]  E. Dougherty,et al.  NONLINEAR PROBIT GENE CLASSIFICATION USING MUTUAL INFORMATION AND WAVELET-BASED FEATURE SELECTION , 2004 .

[14]  Cheng-I Chen,et al.  A neural network-based method of modeling electric arc furnace load for power engineering study , 2010, IEEE PES General Meeting.

[15]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[16]  Peter Cheeseman,et al.  Bayesian Methods for Adaptive Models , 2011 .

[17]  Max A. Viergever,et al.  Mutual-information-based registration of medical images: a survey , 2003, IEEE Transactions on Medical Imaging.

[18]  Mark J. L. Orr,et al.  Regularization in the Selection of Radial Basis Function Centers , 1995, Neural Computation.

[19]  Sheng Chen,et al.  Local regularization assisted orthogonal least squares regression , 2006, Neurocomputing.

[20]  Sheng Chen,et al.  Orthogonal least squares methods and their application to non-linear system identification , 1989 .

[21]  R. H. Myers Classical and modern regression with applications , 1986 .

[22]  Paul Sharkey,et al.  Automatic nonlinear predictive model-construction algorithm using forward regression and the PRESS statistic , 2003 .

[23]  John N. Lygouras,et al.  Artificial Odor Discrimination System Using Electronic Nose and Neural Networks for the Identification of Urinary Tract Infection , 2008, IEEE Transactions on Information Technology in Biomedicine.

[24]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[25]  Sheng Chen,et al.  A Kernel-Based Two-Class Classifier for Imbalanced Data Sets , 2007, IEEE Transactions on Neural Networks.

[26]  Roberta E. Martin,et al.  Taxonomy and remote sensing of leaf mass per area (LMA) in humid tropical forests. , 2011, Ecological applications : a publication of the Ecological Society of America.

[27]  Sheng Chen,et al.  Regularized orthogonal least squares algorithm for constructing radial basis function networks , 1996 .

[28]  Meng Joo Er,et al.  Online adaptive fuzzy neural identification and control of a class of MIMO nonlinear systems , 2003, IEEE Trans. Fuzzy Syst..

[29]  G-C Luh,et al.  Identification of immune models for fault detection , 2004 .

[30]  K. M. Tsang,et al.  Adaptive control of power factor correction converter using nonlinear system identification , 2005 .

[31]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[33]  Sheng Chen,et al.  A fast linear-in-the-parameters classifier construction algorithm using orthogonal forward selection to minimize leave-one-out misclassification rate , 2008, Int. J. Syst. Sci..

[34]  Stephen A. Billings,et al.  Radial Basis Function Network Configuration Using Mutual Information and the Orthogonal Least Squares Algorithm , 1996, Neural Networks.

[35]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[36]  Qinghua Zhang,et al.  Using wavelet network in nonparametric estimation , 1997, IEEE Trans. Neural Networks.

[37]  Claude Kauffmann,et al.  In Vivo Supervised Analysis of Stent Reendothelialization From Optical Coherence Tomography , 2010, IEEE Transactions on Medical Imaging.

[38]  Ryan Mukai,et al.  Adaptive acquisition and tracking for deep space array feed antennas , 2002, IEEE Trans. Neural Networks.

[39]  Sheng Chen,et al.  Fast Kernel Classifier Construction Using Orthogonal Forward Selection to Minimise Leave-One-Out Misclassification Rate , 2006, ICIC.

[40]  Chao-Ming Huang,et al.  An RBF Network With OLS and EPSO Algorithms for Real-Time Power Dispatch , 2007, IEEE Transactions on Power Systems.

[41]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[42]  Sheng Chen,et al.  Sparse modeling using orthogonal forward regression with PRESS statistic and regularization , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).