论文信息 - Speech Analyser in an ICAI System for TESOL

Speech Analyser in an ICAI System for TESOL

There is an increasing demand for computer software which can provide useful personalised feedback to English as a Second Language (ESL) speakers on prosodic aspects of their speech, to supplement the shortage of ESL teachers and reduce the cost of learning. This thesis concentrates on constructing such an Intelligent Computer Aided Instruction (ICAI) prototype system, particularly focusing on one component — the Speech Analyser. The speech analyser recognises a user’s speech, identifies the rhythmic stress pattern in the speech, discovers stress and rhythm errors in the speech, and provides reports for the other component generating personalised feedback to the user on ways of effectively improving the prosodic aspects of the speech. We build an Hidden Markov Model (HMM) based speech recogniser to recognise a user’s speech. A set of parameters for constructing the recogniser is investigated by an exhaustive experiment implemented in a client/server computing network. The exploration suggests that the choice of parameters is very important. We build stress detectors to detect the rhythmic stress pattern in the user’s speech by using both Support Vector Machine (SVM) and Decision Tree (DT) techniques. The detector using SVM outperforms the one using DT. It suggests that SVM is more suitable for a relatively large data set with all numeric data than DT. We build an error identifier to automatically identify stress and rhythm errors in the user’s speech. A two-layer phoneme alignment algorithm using the Needleman/Wunsch technique is developed to facilitate the prosodic error identification problem. Our study also suggests that the foot comparison method is better than Vowel Onset Point comparison method for automatically identifying the main rhythm errors in the user’s speech.

Huayang Xie | Huayang Xie

[1] Anil K. Jain,et al. Artificial Neural Networks: A Tutorial , 1996, Computer.

[2] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[3] Dwight L. Bolinger,et al. Forms of English : accent, morpheme, order , 1965 .

[4] C. D. Forgie,et al. Automatic Recognition of Spoken Digits , 1958 .

[5] W. Francis,et al. The London-Lund Corpus of Spoken English: Description and Research , 1992 .

[6] R R Munro,et al. In Search of the Acoustic Correlates of Stress: Fundamental Frequency, Amplitude, and Duration in the Connected Utterance of Some Native and Non-Native Speakers of English , 1978, Phonetica.

[7] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[8] T.H. Crystal,et al. Linear prediction of speech , 1977, Proceedings of the IEEE.

[9] Colin W. Wightman. Automatic detection of prosodic constituents for parsing , 1992 .

[10] Lou Boves,et al. Acoustic characteristics of lexical stress in continuous speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] J. Tebelskis,et al. Speech recognition using neural networks , 1996 .

[12] Louis C. W. Pols,et al. A preliminary study about robust speech recognition for a robotics application , 1997 .

[13] D. Bolinger. Two kinds of vowels, two kinds of rhythm , 1981 .

[14] S. D. Hansen,et al. Hidden Markov models and neural networks for speech recognition , 1999 .

[15] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[16] R. M. Dauer. Stress-timing and syllable-timing reanalyzed. , 1983 .

[17] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[18] M. Pazzani,et al. Concept formation knowledge and experience in unsupervised learning , 1991 .

[19] J. Morton,et al. Perceptual centers (P-centers). , 1976 .

[20] John R. Anderson,et al. MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[21] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[22] Giuseppe Riccardi,et al. THE 1994 AT&T ATIS CHRONUS RECOGNIZER , 1994 .

[23] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[24] Michael S. Scordilis,et al. Development and comparison of three syllable stress classifiers , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[25] Maxine Eskénazi,et al. Detection of foreign speakers' pronunciation errors for second language training-preliminary results , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[26] Mengjie Zhang,et al. Learning Models for English Speech Recognition , 2004, ACSC.

[27] Lou Boves,et al. The Dutch polyphone corpus , 1995, EUROSPEECH.

[28] Richard J. Povinelli,et al. Speech recognition using reconstructed phase space features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[29] John E. Markel,et al. Linear Prediction of Speech , 1976, Communication and Cybernetics.

[30] David J. Spiegelhalter,et al. Machine Learning, Neural and Statistical Classification , 2009 .

[31] Peter Roach. On the distinction between 'stress-timed' and 'syllable-timed' languages , 1982 .

[32] Colin W. Wightman,et al. The aligner: text to speech alignment using Markov models and a pronunciation dictionary , 1994, SSW.

[33] E. Couper-Kuhlen. An introduction to English prosody , 1986 .

[34] John H. L. Hansen,et al. Automatic segmentation of speech recorded in unknown noisy channel characteristics , 1998, Speech Communication.

[35] Ruxin Chen,et al. Lexical stress detection on stress-minimal word pairs , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[36] A. M. Aull,et al. Lexical stress determination and its application to large vocabulary speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37] K. Davis,et al. Automatic Recognition of Spoken Digits , 1952 .

[38] Kenneth L. Pike,et al. Phonetics: A Critical Analysis of Phonetic Theory and a Technique for the Practical Description of Sounds , 1943 .

[39] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[40] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[41] G D Allen,et al. The Location of Rhythmic Stress Beats in English : an Experimental Study II , 1972, Language and speech.

[42] John Galletly,et al. Neural Networks: : An Introduction ‐ 2nd edition , 1998 .

[43] Melvyn J. Hunt. Signal representation , 1997 .

[44] G. Allen. The Location of Rhythmic Stress Beats in English: an Experimental Study I , 1972, Language and speech.

[45] F. Itakura,et al. Minimum prediction residual principle applied to speech recognition , 1975 .

[46] S. S. Stevens,et al. Critical Band Width in Loudness Summation , 1957 .

[47] D. O'Shaughnessy,et al. Linear predictive coding , 1988, IEEE Potentials.