Referential Vowel Duration Ratio as a Feature for Automatic Assessment of L2 Word Prosody

This paper proposes a referential vowel duration ratio for a pair of vowels in consecutive syllables and a weighted mean of the referential vowel duration ratios on a logarithmic scale as a feature for automatic assessment of second-language (L2) word prosody. In addition to contours of fundamental frequency (F0) and energy i.e. suprasegmental information of speech, segmental duration of syllables or phonemes provides important information for assessing L2 prosody. For L2 learners, the first step of learning prosody is to put accents or stresses on appropriate syllables in words. A syllable with a stress should be produced longer and one without a stress should be produced shorter. To achieve this, we propose taking a duration ratio for every pair of consecutive vowels in reference to duration contrast of the same vowel pair produced by native speakers. Furthermore, we propose a weighted mean of the ratios on a logarithmic scale in consideration of local importance within a word. In evaluation with English word utterances produced by Japanese learners, the introduction of the weighted mean of the ratio significantly improved the correlation coefficient with subjective scores.

[1]  Akinori Ito,et al.  Automatic Evaluation System of English Prosody Based on Word Importance Factor , 2013 .

[2]  Carlos Gussenhoven,et al.  Durational variability in speech and the Rhythm Class Hypothesis , 2002 .

[3]  Juraj Simko,et al.  Prominence-based Evaluation of L2 Prosody , 2018, INTERSPEECH.

[4]  Néstor Becerra Yoma,et al.  Automatic intonation assessment for computer aided language learning , 2010, Speech Commun..

[5]  Tsuneo Kato,et al.  Automatic Assessment of L2 English Word Prosody Using Weighted Distances of F0 and Intensity Contours , 2018, INTERSPEECH.

[6]  Nobuaki Minematsu,et al.  English Speech Database Read by Japanese Learners for CALL System Development , 2002, LREC.

[7]  Elmar Nöth,et al.  Automatic assessment of non-native prosody , 2009, SLaTE.

[8]  Lei Chen,et al.  Applying Rhythm Features to Automatically Assess Non-Native Speech , 2011, INTERSPEECH.

[9]  David Escudero Mancebo,et al.  Automatic Assessment of Non-Native Prosody by Measuring Distances on Prosodic Label Sequences , 2017, INTERSPEECH.

[10]  Jian Cheng Automatic Assessment of Prosody in High-Stakes English Tests , 2011, INTERSPEECH.

[11]  Klaus Zechner,et al.  Applying rhythm metrics to non-native spontaneous speech , 2013, SLaTE.

[12]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[13]  Sid-Ahmed Selouani,et al.  Application of the pairwise variability index of speech rhythm with particle swarm optimization to the classification of native and non-native accents , 2018, Comput. Speech Lang..

[14]  D. Klatt Linguistic uses of segmental duration in English: acoustic and perceptual evidence. , 1976, The Journal of the Acoustical Society of America.