Automatic Assessment of L2 English Word Prosody Using Weighted Distances of F0 and Intensity Contours

In the current paper, an automatic prosody assessment method for learners of English using a weighted comparison of fundamental frequency (F0) and intensity contours is proposed. Patterns of F0 and intensity of learners are compared to that of native using a proposed metric – a weighted distance – in which the error around the high values of prosodic features have more weight in the computation of the final distance. Gold-standard native references are built using the k-means clustering algorithm. Therefore, we also propose a data-driven criterion called weighted variance based on the weighted similarity within the whole set of native utterances to determine the optimal number of clusters k. In comparison with baseline contour comparison metrics which resulted in a subjective-objective score correlation of 0.278, our method combining the proposed metric and criterion led to a final subjective-objective score correlation of 0.304. In comparison, subjective scores correlated at 0.480.

[1]  Oliver Niebuhr,et al.  The influence of F0 contour continuity on prominence perception , 2013, INTERSPEECH.

[2]  A.C.M. Rietveld,et al.  On the relation between pitch excursion size and prominence , 1985 .

[3]  Su-Youn Yoon,et al.  Word-level F0 modeling in the automated assessment of non-native read speech , 2015, SLaTE.

[4]  Yong Zhao,et al.  Modeling stylized invariance and local variability of prosody in text-to-speech synthesis , 2006, Speech Commun..

[5]  Nobuaki Minematsu,et al.  English Speech Database Read by Japanese Learners for CALL System Development , 2002, LREC.

[6]  Shrikanth S. Narayanan,et al.  Better nonnative intonation scores through prosodic theory , 2008, INTERSPEECH.

[7]  Steven A. Stahl,et al.  Becoming a Fluent Reader: Reading Skill and Prosodic Features in the Oral Reading of Young Readers. , 2004, Journal of educational psychology.

[8]  Jian Cheng Automatic Assessment of Prosody in High-Stakes English Tests , 2011, INTERSPEECH.

[9]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[10]  Nestor Becerra-Yoma,et al.  Automatic intonation assessment for computer aided language learning , 2010 .

[11]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[12]  David Escudero Mancebo,et al.  Automatic Assessment of Non-Native Prosody by Measuring Distances on Prosodic Label Sequences , 2017, INTERSPEECH.

[13]  H. H. Rump,et al.  The perceptual prominence of fundamental frequency peaks. , 1997, The Journal of the Acoustical Society of America.