论文信息 - Phoneme Recognition Using Neural Network and Sequence Learning Model

Phoneme Recognition Using Neural Network and Sequence Learning Model

The purpose of this thesis is to describe a biologically motivated approach for phoneme recognition by using a self-organized neural network and sequence learning algorithm. Phoneme recognition in continuous speech is a tough task with a low accuracy rate. By using the sequence learning algorithm to add sequential information of individual phonemes, recognition performance can be improved. This thesis includes three parts. A self-organized neural network is the first stage, which classifies the input sound waves into forty two different phoneme categories. The 42 output neurons of the neural network are sent to the Sequence Learning block which is composed of Long Term Memory cells. Finally each LTM cell sends a unique feedback strength signal to each output of the neural network to predict the next phoneme, hence, to improve the phoneme recognition based on the sequential information.

Yiming Huang

[1] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[2] Tom E. Bishop,et al. Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3] James L. Flanagan,et al. Telephone speech recognition using neural networks and hidden Markov models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4] Janusz A. Starzyk,et al. Hierarchical self-organizing learning systems for embodied intelligence , 2009 .

[5] Minyue Fu,et al. The use of wavelet transforms in phoneme recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6] Chee Peng Lim,et al. Development of a speaker recognition system using wavelets and artificial neural networks , 2001, Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489).

[7] Amparo Alonso-Betanzos,et al. Linear-least-squares initialization of multilayer perceptrons through backpropagation of the desired response , 2005, IEEE Transactions on Neural Networks.

[8] Haibo He,et al. Anticipation-Based Temporal Sequences Learning in Hierarchical Structure , 2007, IEEE Transactions on Neural Networks.

[9] Ingrid Daubechies,et al. Ten Lectures on Wavelets , 1992 .

[10] Climent Nadeu,et al. Wavelet transforms for non-uniform speech recognition systems , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11] Christopher John Long,et al. Wavelet based feature extraction for phoneme recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12] F. Fallside,et al. Continuous speech recognition for the TIMIT database using neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[13] W. Bowen,et al. Philadelphia , 1892 .