Enhancing information discriminant analysis: Feature extraction with linear statistical model and information-theoretic criteria

In this paper, we develop a novel feature transformation method for supervised linear dimensionality reduction. Existing methods, e.g., Information Discriminant Analysis (IDA), estimate the first and second order statistics of the data in the original high-dimensional space, and then design the transformation matrix based on the information-theoretic criteria. Unfortunately, such transformation methods are sensitive to the accuracy of the statistics estimation. To overcome this disadvantage, our method describes the statistical structure of the transformed low-dimensional subspace via a linear statistical model, which can reduce the number of unknown parameters, while simultaneously maximizes the mutual information (MI) between the transformed data and their class labels, which can ensure the between-class separability according to the information theory. The key idea is that we seek the optimal model parameters, including the transformation matrix, via the joint optimization of MI function and log-likelihood function, therefore, this method can not only reduce the estimation errors but also maximize the between-class separability. Experimental results based on synthetic dataset and benchmark datasets demonstrate the better performance of our method over other related methods. We develop a novel feature transformation method for linear dimensionality reduction.The statistics in the transformed subspace are learned to reduce unknown parameters.The transformation matrix is obtained via joint optimization of MI and likelihood.Our method can maximize between-class separability as well as reduce estimation errors.Experimental results show our method performs better than other related methods.

[1]  Kari Torkkola,et al.  Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions , 2001, NIPS.

[2]  Zoran Nenadic,et al.  Approximate information discriminant analysis: A computationally simple heteroscedastic feature extraction technique , 2008, Pattern Recognit..

[3]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[4]  Samuel Kaski,et al.  Informative Discriminant Analysis , 2003, ICML.

[5]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[6]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[7]  Michael E. Tipping,et al.  Mixtures of Principal Component Analysers , 1997 .

[8]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[9]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[10]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[11]  E. Oja,et al.  Independent Component Analysis , 2013 .

[12]  Xuelong Li,et al.  Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  A. Robert Calderbank,et al.  Nonlinear Information-Theoretic Compressive Measurement Design , 2014, ICML.

[14]  Martin E. Hellman,et al.  Probability of error, equivocation, and the Chernoff bound , 1970, IEEE Trans. Inf. Theory.

[15]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[17]  Antonio Artés-Rodríguez,et al.  Maximization of Mutual Information for Supervised Linear Feature Extraction , 2007, IEEE Transactions on Neural Networks.

[18]  Jian Yang,et al.  Median-mean line based discriminant analysis , 2014, Neurocomputing.

[19]  Richard M. Everson,et al.  Independent Component Analysis: Principles and Practice , 2001 .

[20]  Te-Won Lee,et al.  Independent Component Analysis , 1998, Springer US.

[21]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[22]  David B. Dunson,et al.  Compressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds , 2010, IEEE Transactions on Signal Processing.

[23]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[24]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[25]  Zoran Nenadic,et al.  Information Discriminant Analysis: Feature Extraction with an Information-Theoretic Objective , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  A. Robert Calderbank,et al.  Communications Inspired Linear Discriminant Analysis , 2012, ICML.

[27]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[28]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[29]  Deniz Erdogmus,et al.  Feature extraction using information-theoretic learning , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Soo-Young Lee,et al.  Discriminant Independent Component Analysis , 2011, IEEE Trans. Neural Networks.