An improved feature transformation method using mutual information

The feature transformation is a very important step in pattern recognition systems. A feature transformation matrix can be obtained using different criteria such as discrimination between classes or feature independence or mutual information between features and classes. The obtained matrix can also be used for feature reduction. In this paper, we propose a new method for finding a feature transformation-based on Mutual Information (MI). For this purpose, we suppose that the Probability Density Function (PDF) of features in classes is Gaussian, and then we use the gradient ascent to maximize the mutual information between features and classes. Experimental results show that the proposed MI projection consistently outperforms other methods for a variety of cases. In the UCI Glass database we improve the classification accuracy up to 7.95 %. Besides, the improvement of phoneme recognition rate is 3.55 % on TIMIT.

[1]  William M. Campbell,et al.  Mutual Information in Learning Feature Transformations , 2000, ICML.

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Mukund Padmanabhan,et al.  Maximizing information content in feature extraction , 2005, IEEE Transactions on Speech and Audio Processing.

[4]  Deniz Erdogmus,et al.  Feature extraction using information-theoretic learning , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Olivier Siohan,et al.  On the robustness of linear discriminant analysis as a preprocessing step for noisy speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[8]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[9]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[12]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[13]  David G. Stork,et al.  Pattern Classification , 1973 .