Neural Network Based Nonlinear Discriminant Analysis for Speech Recognition

Neural networks have been one of the most successful recognition models for automatic speech recognition systems because of their high discriminative power and adaptive learning. In many speech recognition tasks, especially for discrete speech classification, it has been shown that neural networks are very powerful for classifying short-time acoustic-phonetic units, such as individual phonemes. Moreover, neural networks have a strong ability for dimensionality reduction. In contrast to many linear dimensionality reduction techniques including Principal Components Analysis (PCA) and Linear Discriminant Analysis (LDA), neural network based nonlinear reduction approaches are able to form a dimensionally-reduced representation for complex data such as speech features, while preserving variability and discriminability of the original data. In this paper, a neural network is combined with Hidden Markov Models (HMMs) for a continuous phonetic speech recognition system, in which the neural network is trained with phonetic labeling information as a classifier to maximize discrimination among speech features for the speech recognition based on HMMs. Additionally, the dimensionality of speech features is reduced by the neural network with the goal of creating a compact set of highly discriminative features for accurate speech recognition. Experimental evaluation using the TIMIT database shows that the combination of neural networks and HMMs is quite effective for improving recognition accuracy.

[1]  Hervé Bourlard,et al.  Continuous speech recognition by connectionist statistical methods , 1993, IEEE Trans. Neural Networks.

[2]  Daniel P. W. Ellis,et al.  Tandem acoustic modeling in large-vocabulary recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  Jan de Leeuw,et al.  Nonlinear Principal Component Analysis , 1982 .

[5]  Stephen A. Zahorian,et al.  A neural network based nonlinear feature transformation for speech recognition , 2008, INTERSPEECH.

[6]  Christoph Neukirchen,et al.  Large vocabulary speaker-independent continuous speech recognition with a new hybrid system based on MMI-neural networks , 1995, EUROSPEECH.

[7]  Stephen A. Zahorian,et al.  Dimensionality reduction of speech features using nonlinear principal components analysis , 2007, INTERSPEECH.

[8]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[9]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Stephen A. Zahorian,et al.  A partitioned neural network approach for vowel classification using smoothed time/frequency features , 1999, IEEE Trans. Speech Audio Process..

[11]  Dong Dong,et al.  Nonlinear principal component analysis-based on principal curves and neural networks , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[12]  Marco Gori,et al.  A survey of hybrid ANN/HMM models for automatic speech recognition , 2001, Neurocomputing.

[13]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .