Dimensionality reduction methods for HMM phonetic recognition

This paper presents two nonlinear feature dimensionality reduction methods based on neural networks for a HMM-based phone recognition system. The neural networks are trained as feature classifiers to reduce feature dimensionality as well as maximize discrimination among speech features. The outputs of different network layers are used for obtaining transformed features. Moreover, the training of the neural networks uses the category information that corresponds to a state in HMMs so that the trained networks can better accommodate the temporal variability of features and obtain more discriminative features in a low dimensional space. Experimental evaluation using the TIMIT database shows that recognition accuracies with the transformed features are slightly higher than those obtained with original features and considerably higher than obtained with linear dimensionality reduction methods. The highest phone accuracy obtained with 39 phone classes and TIMIT was 74.9% using a large number of training iterations based on the state-specific targets.

[1]  Stephen A. Zahorian,et al.  Dimensionality reduction of speech features using nonlinear principal components analysis , 2007, INTERSPEECH.

[2]  Marco Gori,et al.  A survey of hybrid ANN/HMM models for automatic speech recognition , 2001, Neurocomputing.

[3]  Stephen A. Zahorian,et al.  A neural network based nonlinear feature transformation for speech recognition , 2008, INTERSPEECH.

[4]  Jiang Wu,et al.  Spectral and temporal modulation features for phonetic recognition , 2009, INTERSPEECH.

[5]  Panu Somervuo,et al.  Experiments with linear and nonlinear feature transformations in HMM based phone recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Chong Kwan Un,et al.  An MLP/HMM hybrid model using nonlinear predictors , 1996, Speech Commun..

[7]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Hervé Bourlard,et al.  Continuous speech recognition by connectionist statistical methods , 1993, IEEE Trans. Neural Networks.

[10]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11]  Hervé Bourlard,et al.  Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Hynek Hermansky,et al.  Combining evidence from a generative and a discriminative model in phoneme recognition , 2008, INTERSPEECH.