Low Resource Automatic Intonation Classification Using Gated Recurrent Unit (GRU) Networks Pre-Trained with Synthesized Pitch Patterns

Second language learners of British English (BE) are typically trained to learn four intonation classes – Glide-up, Glide-down, Dive and Take-off. We predict the intonation class in a learner’s utterance by modeling the temporal dependencies in the pitch patterns with gated recurrent unit (GRU) networks. For these, we pre-train the GRU network using a set of synthesized pitch patterns representing each intonation class. For the synthesis, we propose to obtain pitch patterns from the tone sequences representing each intonation class obtained from domain knowledge. Experiments are conducted on speech data collected from experts in a spoken English training material for teaching BE intonation. The absolute improvements in the unweighted average recall (UAR) using the proposed scheme with pre-training are found to be 4.14% and 6.01% respectively over the proposed approach without pre-training and the baseline scheme that uses hidden Markov models (HMMs).

[1]  Nestor Becerra-Yoma,et al.  Automatic intonation assessment for computer aided language learning , 2010 .

[2]  Holly Joy Nibert Phonetic and Phonological Evidence for Intermediate Phrasing in Spanish Intonation , 2000 .

[3]  Fathi M. Salem,et al.  Gate-variants of Gated Recurrent Unit (GRU) neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[4]  John G Harris,et al.  A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.

[5]  Chiranjeevi Yarra,et al.  An Automatic Classification of Intonation Using Temporal Structure in Utterance-level Pitch Patterns for British English Speech , 2018, 2018 15th IEEE India Council International Conference (INDICON).

[6]  Silke M. Witt,et al.  Use of speech recognition in computer-assisted language learning , 2000 .

[7]  Chiranjeevi Yarra,et al.  Automatic intonation classification using temporal patterns in utterance-level pitch contour and perceptually motivated pitch transformation. , 2018, The Journal of the Acoustical Society of America.

[8]  I. Mees,et al.  The phonetics of English and Dutch , 1996 .

[9]  Laura E. DE RUITER Polynomial Modeling of Child and Adult Intonation in German Spontaneous Speech , 2011, Language and speech.

[10]  J. Pierrehumbert,et al.  Intonational structure in Japanese and English , 1986, Phonology.

[11]  Paul Warren,et al.  Issues in the Study of Intonation in Language Varieties , 2005, Language and speech.

[12]  A. Cruttenden Intonational diglossia: a case study of Glasgow , 2007, Journal of the International Phonetic Association.

[13]  Esther Grabe,et al.  Variation Adds to Prosodic Typology , 2002 .

[14]  Razvan Pascanu,et al.  Theano: Deep Learning on GPUs with Python , 2012 .

[15]  Bo Xu,et al.  Chinese intonation assessment using SEV features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  GENERAL ATTITUDINAL MEANINGS IN RP INTONATION , 2008 .

[17]  Eduardo Coutinho,et al.  The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language , 2016, INTERSPEECH.

[18]  Kun Li,et al.  Intonation classification for L2 English speech using multi-distribution deep neural networks , 2017, Comput. Speech Lang..

[19]  Greg Kochanski,et al.  Connecting Intonation Labels to Mathematical Descriptions of Fundamental Frequency , 2007, Language and speech.

[20]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[21]  Pilar Prieto,et al.  Intonation as an Encoder of Speaker Certainty: Information and Confirmation Yes-No Questions in Catalan , 2013, Language and speech.

[22]  David Sander,et al.  Tone of Voice and Mind: The Connections between Intonation, Emotion, Cognition, and Consciousness , 2005 .

[23]  J. D. O'Connor Better English Pronunciation , 1967 .