Optimal feature sub-space selection based on discriminant analysis

The performance of a speech recogniser, or of any other pattern classifier, strongly depends on the input features: to obtain a good performance, the feature set needs to be both highly discriminative and compact. Linear discriminant analysis (LDA) is a common data-driven method used to find linear transformations that map large feature vectors onto smaller ones while retaining most of the discriminative power. LDA however oversimplifies the problem by condensing all class information into only two scatter matrices, hence losing important information on the individual class distributions. We therefore propose a new approach, based on the mutual information or minimum classification error paradigm, which takes all information on the individual class distributions into account while searching an optimal sub-space, thus avoiding the crude approximations done by LDA. Experiments show that the proposed scheme provides more discriminative feature vectors, leading to substantially better recognition results.

[1]  N. Campbell CANONICAL VARIATE ANALYSIS—A GENERAL MODEL FORMULATION , 1984 .

[2]  Heinrich Niemann,et al.  Optimal linear feature transformations for semi-continuous hidden Markov models , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[4]  Gerhard Rigoll,et al.  A NN/HMM hybrid for continuous speech recognition with a discriminant nonlinear feature extraction , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Stephan Euler,et al.  Integrated optimization of feature transformation for speech recognition , 1995, EUROSPEECH.

[7]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[8]  Patrick Wambacq,et al.  Improved feature decorrelation for HMM-based speech recognition , 1998, ICSLP.