Feature Extraction for Classification Using Statistical Networks

In a classification problem, quite often the dimension of the measurement vector is large. Some of these measurements may not be important for separating the classes. Removal of these measurement variables not only reduces the computational cost but also leads to better understanding of class separability. There are some methods in the existing literature for reducing the dimensionality of a classification problem without losing much of the separability information. However, these dimension reduction procedures usually work well for linear classifiers. In the case where competing classes are not linearly separable, one has to look for ideal "features" which could be some transformations of one or more measurements. In this paper, we make an attempt to tackle both, the problems of dimension reduction and feature extraction, by considering a projection pursuit regression model. The single hidden layer perceptron model and some other popular models can be viewed as special cases of this model. An iterative algorithm based on backfitting is proposed to select the features dynamically, and cross-validation method is used to select the ideal number of features. We carry out an extensive simulation study to show the effectiveness of this fully automatic method.

[1]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[2]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[3]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  P. Chaudhuri,et al.  On data depth and distribution-free discriminant analysis using separating surfaces , 2005 .

[6]  Smarajit Bose,et al.  Backfitting neural networks , 2004, Comput. Stat..

[7]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[8]  Trevor Hastie,et al.  Automatic Smoothing Spline Projection Pursuit , 1994 .

[9]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[10]  Trevor Hastie,et al.  Neural Networks and Related Methods for Classification - Discussion , 1994 .

[11]  D. M. Titterington,et al.  [Neural Networks: A Review from Statistical Perspective]: Rejoinder , 1994 .

[12]  David J. Hand,et al.  Kernel Discriminant Analysis , 1983 .

[13]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[14]  Steven N. MacEachern,et al.  Classification via kernel product estimators , 1998 .

[15]  Smarajit Bose,et al.  Classification using splines , 1996 .

[16]  D. M. Titterington,et al.  Neural Networks: A Review from a Statistical Perspective , 1994 .

[17]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[18]  G. Seber,et al.  Nonlinear Regression: Seber/Nonlinear Regression , 2005 .

[19]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[20]  Kurt Hornik,et al.  FEED FORWARD NETWORKS ARE UNIVERSAL APPROXIMATORS , 1989 .

[21]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[22]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[23]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[24]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[25]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[26]  Gunnar Rätsch,et al.  Invariant Feature Extraction and Classification in Kernel Spaces , 1999, NIPS.

[27]  Trevor Hastie,et al.  Feature Extraction for Nonparametric Discriminant Analysis , 2003 .

[28]  M. Stone Cross-validation:a review 2 , 1978 .

[29]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[30]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[31]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[32]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[33]  Leo Breiman,et al.  Fitting additive models to regression data , 1993, Computational Statistics & Data Analysis.

[34]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[35]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[36]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[37]  Smarajit Bose Multilayer statistical classifiers , 2003, Comput. Stat. Data Anal..

[39]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[40]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .