SVM decision boundary based discriminative subspace induction

We study the problem of linear dimension reduction for classification, with a focus on sufficient dimension reduction, i.e., finding subspaces without loss of discrimination power. First, we formulate the concept of sufficient subspace for classification in parallel terms as for regression. Then we present a new method to estimate the smallest sufficient subspace based on an improvement of decision boundary analysis (DBA). The main idea is to combine DBA with support vector machines (SVM) to overcome the inherent difficulty of DBA in small sample size situations while keeping DBA's estimation simplicity. The compact representation of SVM boundary results in a significant gain in both speed and accuracy over previous DBA implementations. Alternatively, this technique can be viewed as a way to reduce the run-time complexity of SVM itself. Comparative experiments on one simulated and four real-world benchmark datasets highlight the superior performance of the proposed approach.

[1]  A. Samarov Exploring Regression Structure Using Nonparametric Functional Estimation , 1993 .

[2]  Sanjoy Dasgupta,et al.  Experiments with Random Projection , 2000, UAI.

[3]  Kari Torkkola,et al.  Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions , 2001, NIPS.

[4]  R. H. Moore,et al.  Regression Graphics: Ideas for Studying Regressions Through Graphics , 1998, Technometrics.

[5]  Shingo Tomita,et al.  An optimal orthonormal system for discriminant analysis , 1985, Pattern Recognit..

[6]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[7]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[8]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[9]  J. Preston Ξ-filters , 1983 .

[10]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  J. Polzehl,et al.  Structure adaptive approach for dimension reduction , 2001 .

[12]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  R. Hartley Transmission of information , 1928 .

[14]  N. Campbell CANONICAL VARIATE ANALYSIS—A GENERAL MODEL FORMULATION , 1984 .

[15]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[16]  G AndreouAndreas,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998 .

[17]  J. Friedman Exploratory Projection Pursuit , 1987 .

[18]  A. Juditsky,et al.  Direct estimation of the index coefficient in a single-index model , 2001 .

[19]  Alain Biem,et al.  Pattern recognition using discriminative feature extraction , 1997, IEEE Trans. Signal Process..

[20]  Bor-Chen Kuo,et al.  Improved statistics estimation and feature extraction for hyperspectral data classification , 2001 .

[21]  David A. Landgrebe,et al.  Decision boundary feature extraction for nonparametric classification , 1993, IEEE Trans. Syst. Man Cybern..

[22]  David A. Landgrebe,et al.  Decision boundary feature extraction for neural networks , 1992, [Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics.

[23]  E. Oja,et al.  Independent Component Analysis , 2013 .

[24]  Si Wu,et al.  Improving support vector machine classifiers by modifying kernel functions , 1999, Neural Networks.

[25]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[26]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[27]  David A. Landgrebe,et al.  Feature Extraction Based on Decision Boundaries , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Ljubomir J. Buturovic On the minimal dimension of sufficient statistics , 1992, IEEE Trans. Inf. Theory.

[31]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[33]  Yanxi Liu,et al.  SVM Based Feature Screening Applied To Hierarchical Cervical Cancer Detection , 2003 .

[34]  Ker-Chau Li,et al.  On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .

[35]  Alain Biem,et al.  Filter bank design based on discriminative feature extraction , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  Dimitrios Gunopulos,et al.  Adaptive Nearest Neighbor Classification Using Support Vector Machines , 2001, NIPS.

[37]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[39]  Ravi Kothari,et al.  Adaptive linear dimensionality reduction for classification , 2000, Pattern Recognit..

[40]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[42]  George Saon,et al.  Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[43]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[44]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[45]  Thomas M. Stoker,et al.  Semiparametric Estimation of Index Coefficients , 1989 .

[46]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[47]  Mayer Aladjem Nonparametric discriminant analysis via recursive optimization of Patrick-Fisher distance , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[48]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[49]  William M. Campbell,et al.  Mutual Information in Learning Feature Transformations , 2000, ICML.

[50]  David A. Landgrebe,et al.  Hyperspectral data analysis and supervised feature reduction via projection pursuit , 1999, IEEE Trans. Geosci. Remote. Sens..

[51]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[52]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[53]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[54]  Li Deng,et al.  HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features , 1997, IEEE Trans. Speech Audio Process..

[55]  Ravi Kothari,et al.  Fractional-Step Dimensionality Reduction , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[56]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[57]  Ralf Herbrich,et al.  Learning Kernel Classifiers , 2001 .

[58]  Marian Stewart Bartlett,et al.  Classifying Facial Actions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Kari Torkkola,et al.  On feature extraction by mutual information maximization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[60]  George Saon,et al.  Minimum Bayes Error Feature Selection for Continuous Speech Recognition , 2000, NIPS.

[61]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[62]  W. Härdle,et al.  Direct Semiparametric Estimation of Single-Index Models with Discrete Covariates dpsfb950075.ps.tar = Enno MAMMEN J.S. MARRON: Mass Recentered Kernel Smoothers , 1996 .

[63]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[64]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[65]  E. Kleinberg An overtraining-resistant stochastic modeling method for pattern recognition , 1996 .

[66]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[67]  J. Friedman Regularized Discriminant Analysis , 1989 .

[68]  Ljubomir J. Buturovic Toward Bayes-Optimal Linear Dimension Reduction , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[69]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[70]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[71]  Ker-Chau Li Sliced inverse regression for dimension reduction (with discussion) , 1991 .

[72]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[73]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[74]  K. Fukunaga,et al.  Nonparametric Discriminant Analysis , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Mark J. F. Gales Maximum likelihood multiple subspace projections for hidden Markov models , 2002, IEEE Trans. Speech Audio Process..

[76]  Paul S. Bradley,et al.  Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[77]  Jing Peng,et al.  LDA/SVM driven nearest neighbor classification , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[78]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[79]  K. Torkkola,et al.  Nonlinear feature transforms using maximum mutual information , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[80]  R. Cook,et al.  Principal Hessian Directions Revisited , 1998 .

[81]  Joydeep Ghosh,et al.  Linear feature extractors based on mutual information , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[82]  Robert P. W. Duin,et al.  Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[83]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[84]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[85]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[86]  Trevor Hastie,et al.  Feature Extraction for Nonparametric Discriminant Analysis , 2003 .

[87]  R. Cook,et al.  Dimension reduction for conditional mean in regression , 2002 .