Stochastic discriminant analysis for linear supervised dimension reduction

Abstract In this paper, we consider a linear supervised dimension reduction method for classification settings: stochastic discriminant analysis (SDA). This method matches similarities between points in the projection space with those in a response space. The similarities are represented by transforming distances between points to joint probabilities using a transformation which resembles Student’s t-distribution. The matching is done by minimizing the Kullback–Leibler divergence between the two probability distributions. We compare the performance of our SDA method against several state-of-the-art methods for supervised linear dimension reduction. In our experiments, we found that the performance of the SDA method is often better and typically at least equal to the compared methods. We have made experiments with various types of data sets having low, medium, or high dimensions and quite different numbers of samples, and with both sparse and dense data sets. If there are several classes in the studied data set, the low-dimensional projections computed using our SDA method provide often higher classification accuracies than the compared methods.

[1]  Songsong Wu,et al.  Stochastic neighbor projection on manifold for feature extraction , 2011, Neurocomputing.

[2]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[3]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[4]  I. Jolliffe Principal Component Analysis , 2002 .

[5]  Zenglin Xu,et al.  Heavy-Tailed Symmetric Stochastic Neighbor Embedding , 2009, NIPS.

[6]  M. Tenenhaus,et al.  Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach , 2003, Human Genetics.

[7]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[8]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[9]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[10]  Dacheng Tao,et al.  Max-Min Distance Analysis by Using Sequential SDP Relaxation for Dimension Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[12]  Zhanxing Zhu,et al.  Supervised Distance Preserving Projections , 2013, Neural Processing Letters.

[13]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[14]  Dong Xu,et al.  Trace Ratio vs. Ratio Trace for Dimensionality Reduction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Shuicheng Yan,et al.  Graph embedding: a general framework for dimensionality reduction , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  K. Fukumizu,et al.  Gradient-Based Kernel Dimension Reduction for Regression , 2014 .

[17]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[18]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[19]  Zohreh Azimifar,et al.  Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds , 2011, Pattern Recognit..

[20]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[21]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[22]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[23]  Tutut Herawan,et al.  Computational and mathematical methods in medicine. , 2006, Computational and mathematical methods in medicine.

[24]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[25]  Samuel Kaski,et al.  Informative Discriminant Analysis , 2003, ICML.

[26]  Jarkko Venna,et al.  Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization , 2010, J. Mach. Learn. Res..

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[28]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[29]  R. Cook,et al.  Sufficient dimension reduction and prediction in regression , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[30]  Wanliang Wang,et al.  Fast Discriminative Stochastic Neighbor Embedding Analysis , 2013, Comput. Math. Methods Medicine.

[31]  Jian Yang,et al.  Why can LDA be performed in PCA transformed space? , 2003, Pattern Recognit..

[32]  Hwann-Tzong Chen,et al.  Local discriminant embedding and its variants , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[34]  Bronius Grigelionis Student's t -Distribution , 2011, International Encyclopedia of Statistical Science.

[35]  Masashi Sugiyama,et al.  Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..

[36]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.