Feature selection using localized generalization error for supervised classification problems using RBFNN

A pattern classification problem usually involves using high-dimensional features that make the classifier very complex and difficult to train. With no feature reduction, both training accuracy and generalization capability will suffer. This paper proposes a novel hybrid filter-wrapper-type feature subset selection methodology using a localized generalization error model. The localized generalization error model for a radial basis function neural network bounds from above the generalization error for unseen samples located within a neighborhood of the training samples. Iteratively, the feature making the smallest contribution to the generalization error bound is removed. Moreover, the novel feature selection method is independent of the sample size and is computationally fast. The experimental results show that the proposed method consistently removes large percentages of features with statistically insignificant loss of testing accuracy for unseen samples. In the experiments for two of the datasets, the classifiers built using feature subsets with 90% of features removed by our proposed approach yield average testing accuracies higher than those trained using the full set of features. Finally, we corroborate the efficacy of the model by using it to predict corporate bankruptcies in the US.

[1]  J. Ross Quinlan,et al.  Decision trees and decision-making , 1990, IEEE Trans. Syst. Man Cybern..

[2]  Edward I. Altman,et al.  Bankruptcy, Credit Risk, and High Yield Junk Bonds , 2002 .

[3]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[4]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  I. Jolliffe Principal Component Analysis , 2002 .

[6]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  W. Beaver,et al.  Have Financial Statements Become Less Informative? Evidence from the Ability of Financial Ratios to Predict Bankruptcy , 2004 .

[8]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[9]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[10]  Lipo Wang,et al.  Data dimensionality reduction with application to simplifying RBF network structure and improving classification performance , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[11]  Michael Firth,et al.  Are Unsolicited Credit Ratings Lower? International Evidence from Bank Ratings , 2005 .

[12]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Daniel S. Yeung,et al.  Localized Generalization Error of Gaussian-based Classifiers and Visualization of Decision Boundaries , 2006, Soft Comput..

[14]  Vineet Agarwal,et al.  Comparing the Performance of Market-Based and Accounting-Based Bankruptcy Prediction Models , 2006 .

[15]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[16]  George J. Benston,et al.  Enron: what happened and what we can learn from it , 2002 .

[17]  K.Z. Mao,et al.  Orthogonal forward selection and backward elimination algorithms for feature subset selection , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[19]  Viral V. Acharya,et al.  Credit Risk: Pricing, Measurement, and Management , 2005 .

[20]  Laurence Anthony,et al.  Relevant, irredundant feature selection and noisy example elimination , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  Witold Pedrycz,et al.  Image classification with the use of radial basis function neural networks and the minimization of the localized generalization error , 2007, Pattern Recognit..

[22]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[23]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[24]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[25]  Robert X. Gao,et al.  PCA-based feature selection scheme for machine defect classification , 2004, IEEE Transactions on Instrumentation and Measurement.

[26]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[27]  Daniel S. Yeung,et al.  Localized Generalization Error Model and Its Application to Architecture Selection for Radial Basis Function Neural Network , 2007, IEEE Transactions on Neural Networks.