Rhythm analysis of second-language speech through low-frequency auditory features

Rhythm patterns play an important role in the perception of second-language (L2) speech. This paper presents a novel approach to evaluating L2 speech rhythm using low-frequency spectral features inspired by the rhythmogram auditory model. In this paper we investigate several new feature sets for use in training rhythm-centric acoustic models. By capturing information over suprasegmental linguistic units appropriate for rhythmic analysis (including syllables and prosodic feet), these novel features can outperform traditional features in detecting rhythm errors on the ISLE corpus of learner English by 5-15% absolute.

[1]  Bogdan Ludusan,et al.  Integrating Stress Information in Large Vocabulary Continuous Speech Recognition , 2012, INTERSPEECH.

[2]  Mitch Weintraub,et al.  Automatic scoring of pronunciation quality , 2000, Speech Commun..

[3]  Stefan Karnebäck Discrimination between speech and music based on a low frequency modulation feature , 2001, INTERSPEECH.

[4]  Dharmesh Patel Rhythm , 1919, The Craft of Poetry.

[5]  Fabien Ringeval,et al.  Hilbert-Huang Transform for Non-Linear Characterization of Speech Rhythm , 2009 .

[6]  W. Marsden I and J , 2012 .

[7]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[8]  P. Ladefoged A course in phonetics , 1975 .

[9]  José A. Rodríguez-Serrano,et al.  Handwritten word-spotting using hidden Markov models and universal vocabularies , 2009, Pattern Recognit..

[10]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[11]  Sarel van Vuuren,et al.  On the importance of components of the modulation spectrum for speaker verification , 1998, ICSLP.

[12]  Eric Atwell,et al.  The ISLE corpus: Italian and German spoken learner's English , 2003 .

[13]  Neil P. McAngus Todd,et al.  Towards an auditory account of speech rhythm: application of a model of the auditory ‘primal sketch’ to two multi-language corpora , 2004, Cognition.

[14]  Robert F. Port,et al.  Effects of temporal correction on intelligibility of foreign-accented English , 1997 .

[15]  F. Ramus,et al.  Correlates of linguistic rhythm in the speech signal , 1999, Cognition.

[16]  Neil P. McAngus Todd,et al.  The auditory “Primal Sketch”: A multiscale model of rhythmic grouping , 1994 .

[17]  K. Hacioglu,et al.  TESTING SUPRASEGMENTAL ENGLISH THROUGH PARROTING , 2010 .

[18]  Amalia Arvaniti,et al.  Rhythm, Timing and the Timing of Rhythm , 2009, Phonetica.

[19]  Rudi Villing,et al.  Hearing the Moment: Measures and Models of the Perceptual Centre , 2010 .

[20]  Chiu-yu Tseng,et al.  Studying L2 suprasegmental features in asian Englishes: a position paper , 2009, INTERSPEECH.

[21]  Markus Lang,et al.  Algorithms for the Constrained Design of Digital Filters with Arbitrary Magnitude and Phase Respo , 1999 .

[22]  Sam Tilsen,et al.  Low-frequency Fourier analysis of speech rhythm. , 2008, The Journal of the Acoustical Society of America.

[23]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[24]  R. M. Dauer Stress-timing and syllable-timing reanalyzed. , 1983 .

[25]  Antonio Origlia,et al.  On the Use of the Rhythmogram for Automatic Syllabic Prominence Detection , 2011, INTERSPEECH.

[26]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[27]  Shrikanth S. Narayanan,et al.  Automatic syllable stress detection using prosodic features for pronunciation evaluation of language learners , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[28]  William A. Sethares,et al.  Rhythm and Transforms , 2007 .

[29]  Guy J. Brown,et al.  Visualization of rhythm, time and metre , 1996, Artificial Intelligence Review.

[30]  Hugo Quené,et al.  Non-native durational patterns decrease speech intelligibility , 2010, Speech Commun..