Discriminant Independent Component Analysis

A conventional linear model based on Negentropy maximization extracts statistically independent latent variables which may not be optimal to give a discriminant model with good classification performance. In this paper, a single-stage linear semisupervised extraction of discriminative independent features is proposed. Discriminant independent component analysis (dICA) presents a framework of linearly projecting multivariate data to a lower dimension where the features are maximally discriminant with minimal redundancy. The optimization problem is formulated as the maximization of linear summation of Negentropy and weighted functional measure of classification. Motivated by independence among extracted features, Fisher linear discriminant is used as the functional measure of classification. Experimental results show improved classification performance when dICA features are used for recognition tasks in comparison to unsupervised (principal component analysis and ICA) and supervised feature extraction techniques like linear discriminant analysis (LDA), conditional ICA, and those based on information theoretic learning approaches. dICA features also give reduced data reconstruction error in comparison to LDA and ICA method based on Negentropy maximization.

[1]  Lucas C. Parra,et al.  Bilinear Discriminant Component Analysis , 2007, J. Mach. Learn. Res..

[2]  Antonio Artés-Rodríguez,et al.  Maximization of Mutual Information for Supervised Linear Feature Extraction , 2007, IEEE Transactions on Neural Networks.

[3]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[5]  David J. Slate,et al.  Letter Recognition Using Holland-Style Adaptive Classifiers , 1991, Machine Learning.

[6]  Marian Stewart Bartlett,et al.  Face recognition by independent component analysis , 2002, IEEE Trans. Neural Networks.

[7]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[8]  Bin Li,et al.  Discriminant Independent Component Analysis as a subspace representation , 2006 .

[9]  Geoffrey E. Hinton,et al.  Unsupervised learning : foundations of neural computation , 1999 .

[10]  Deniz Erdogmus,et al.  Feature extraction using information-theoretic learning , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Ronald A. Cole,et al.  Spoken Letter Recognition , 1990, HLT.

[14]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[16]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[17]  俊一 甘利,et al.  A. Hyvärinen, J. Karhunen and E. Oja, Independent Component Analysis, Jhon Wiley & Sons, 2001年,504ページ. (根本幾・川勝真喜訳:独立成分分析——信号解析の新しい世界,東京電機大学出版局,2005年,532ページ.) , 2010 .

[18]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[19]  Zhao Rong-chun,et al.  Face recognition method using mutual information and hybrid feature , 2003, Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003.

[20]  Lawrence Sirovich,et al.  Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[22]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[23]  Sungsoo Park,et al.  The POSTECH face database (PF07) and performance evaluation , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[24]  De-Shuang Huang,et al.  Independent component analysis-based penalized discriminant method for tumor classification using gene expression data , 2006, Bioinform..

[25]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[27]  Alfréd Rényi,et al.  Probability Theory , 1970 .

[28]  Shotaro Akaho Conditionally independent component analysis for supervised feature extraction , 2002, Neurocomputing.

[29]  Soo-Young Lee,et al.  Hybrid Feature Selection: Combining Fisher Criterion and Mutual Information for Efficient Feature Selection , 2008, ICONIP.

[30]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[31]  Richard M. Everson,et al.  Independent Component Analysis: Principles and Practice , 2001 .

[32]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[33]  Te-Won Lee,et al.  Independent Component Analysis , 1998, Springer US.

[34]  E. Chong,et al.  Introduction to optimization , 1987 .

[35]  Gang Wang,et al.  Feature selection with conditional mutual information maximin in text categorization , 2004, CIKM '04.

[36]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[37]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[38]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[39]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[40]  E. Oja,et al.  Independent Component Analysis , 2013 .

[41]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[42]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[43]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[44]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[45]  Robert P. W. Duin,et al.  Handwritten digit recognition by combined classifiers , 1998, Kybernetika.

[46]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[47]  Konstantinos N. Plataniotis,et al.  Face recognition using LDA-based algorithms , 2003, IEEE Trans. Neural Networks.

[48]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Robert P. W. Duin,et al.  STATISTICAL PATTERN RECOGNITION , 2005 .

[50]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..