Automatic lexical stress detection using acoustic features for computer-assisted language learning

This paper proposes an English lexical stress detec- tion approach using acoustic features. The approach classifies the vowels of English words into two patterns: primary stress and unstress. We firstly choose the frame-averaged basic feature set of the individual syllable nucleus in polysyllabic words as the baseline to decide the stress pattern. This feature set includes the semitone, the duration, the loudness and the emphasis feature. Furthermore, we introduce the pitch-variation feature set and the context-aware feature set to describe the inside variation characteristic and outside contextual characteristic of the syllable nucleus. By combining the three feature sets, the accuracy rate is improved by 7% 8%. Besides, we train support vector machines (SVMs) classifier for each vowel phoneme respectively. The results show that the phoneme-dependent models performance better than only one shared model. Finally, our system achieved an accuracy of 88:6% compared with human-tagged labels.

[1]  Paul Taylor,et al.  The rise/fall/connection model of intonation , 1994, Speech Communication.

[2]  Paul Taylor,et al.  The tilt intonation model , 1998, ICSLP.

[3]  Mitch Weintraub,et al.  Automatic text-independent pronunciation scoring of foreign language student speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Vincent J. van Heuven,et al.  Acoustic correlates of linguistic stress and accent in Dutch and American English , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Jhing-Fa Wang,et al.  Stress Detection Based on Multi-class Probabilistic Support Vector Machines for Accented English Speech , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[6]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[7]  S. Nooteboom,et al.  THE PROSODY OF SPEECH: MELODY AND RHYTHM , 2001 .

[8]  Jyh-Shing Roger Jang,et al.  Automatic pronunciation scoring using learning to rank and DP-based score segmentation , 2010, INTERSPEECH.

[9]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[10]  Lan Wang,et al.  Automatic lexical stress detection for Chinese learners' of English , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[11]  Shrikanth S. Narayanan,et al.  Automatic syllable stress detection using prosodic features for pronunciation evaluation of language learners , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12]  Fabio Tamburini,et al.  Automatic prosodic prominence detection in speech using acoustic features: an unsupervised system , 2003, INTERSPEECH.