Modified estimation of between-class covariance matrix in linear discriminant analysis of speech

Linear discriminant analysis (LDA) is a popular supervised feature transformation applied in current automatic speech recognition (ASR). Generally, the parameters of LDA are computed from the training data partitioned into classes. If the number of classes is smaller than the dimension of the supervectors (typically in phoneme-based LDA) then the between-class covariance matrix can become singular or close to singular (singularity problem in classical LDA). In this paper, we present a modification of the standard between-class covariance matrix estimation, which represents one of the possible approaches to solving the singularity problem. Our method works directly with the supervectors instead of the class mean vectors. The number of estimation cycles is much larger because more data are used during the computation. Thus, the matrix structure can be significantly refined. This implies that larger lengths of context can be used while the singularity problem is efficiently eliminated. The effectiveness of the proposed estimation is evaluated in Slovak phoneme-based and triphone-based large vocabulary continuous speech recognition (LVCSR) task. The method is compared to the state-of-the-art MFCCs and to LDA trained in the standard way. The experimental results confirm that the modified LDA considerably outperforms the MFCCs and consistently leads to improvements of the conventional LDA.

[1]  Jieping Ye,et al.  A two-stage linear discriminant analysis via QR-decomposition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  W. V. McCarthy,et al.  Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data , 1995 .

[3]  M. Pleva,et al.  Alternative phonetic class definition in linear discriminant analysis of speech , 2012, 2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP).

[4]  Andreas G. Andreou,et al.  Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[5]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[6]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  J. Friedman Regularized Discriminant Analysis , 1989 .

[8]  Milos Cernak,et al.  Rule-Based Triphone Mapping for Acoustic Modeling in Automatic Speech Recognition , 2011, TSD.

[9]  Robert P. W. Duin,et al.  Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix , 1998, Pattern Recognit. Lett..

[10]  Janne Pylkkönen LDA based feature estimation methods for LVCSR , 2006, INTERSPEECH.

[11]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[12]  Jieping Ye,et al.  Two-Dimensional Linear Discriminant Analysis , 2004, NIPS.