Application of a locality preserving discriminant analysis approach to ASR

This paper presents a comparison of three techniques for dimensionally reduction in feature analysis for automatic speech recognition (ASR). All three approaches estimate a linear transformation that is applied to concatenated log spectral features and provide a mechanism for efficient modeling of spectral dynamics in ASR. The goal of the paper is to investigate the effectiveness of a discriminative approach for estimating these feature space transformations which is based on the assumption that speech features lie on a non-linear manifold. This approach is referred to as locality preserving discriminant analysis (LPDA) and is based on the principle of preserving local within-class relationships in this non-linear space while at the same time maximizing separability between classes. This approach was compared to two well known approaches for dimensionality reduction, linear discriminant analysis (LDA) and locality preserving linear projection (LPP), on the Aurora 2 speech in noise task. The LPDA approach was found to provide a significant reduction in WER with respect to the other techniques for most noise types and signal-to-noise ratios (SNRs).

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[3]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[4]  Kun Zhou,et al.  Locality Sensitive Discriminant Analysis , 2007, IJCAI.

[5]  Yun Tang,et al.  A study of using locality preserving projections for feature extraction in speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Detlev Langmann,et al.  A comparative study of linear feature transformation techniques for automatic speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  M. Omair Ahmad,et al.  Optimizing the kernel in the empirical feature space , 2005, IEEE Transactions on Neural Networks.

[8]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[9]  M. Naderi Think globally... , 2004, HIV prevention plus!.

[10]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[11]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[12]  Shuicheng Yan,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007 .

[13]  Hwann-Tzong Chen,et al.  Local discriminant embedding and its variants , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Jie Wang,et al.  Gaussian kernel optimization for pattern classification , 2009, Pattern Recognit..

[15]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[16]  Shuicheng Yan,et al.  Correlation Metric for Generalized Feature Extraction , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  George Saon,et al.  Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Shihong Lao,et al.  Discriminant analysis in correlation similarity measure space , 2007, ICML '07.