Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children's Speech

We present an overview of the ASR challenge for non-native children’s speech organized for a special session at Interspeech 2020. The data for the challenge was obtained in the context of a spoken language proficiency assessment administered at Italian schools for students between the ages of 9 and 16 who were studying English and German as a foreign language. The corpus distributed for the challenge was a subset of the English recordings. Participating teams competed either in a closed track, in which they could use only the training data released by the organizers of the challenge, or in an open track, in which they were allowed to use additional training data. The closed track received 9 entries and the open track received 7 entries, with the best scoring systems achieving substantial improvements over a state-of-the-art baseline system. This paper describes the corpus of non-native children’s speech that was used for the challenge, analyzes the results, and discusses some points that should be considered for subsequent challenges in this domain in the future.

[1]  Avni Rajpal,et al.  Pseudo Likelihood Correction Technique for Low Resource Accented ASR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Keelan Evanini,et al.  Native Language Identification from Raw Waveforms Using Deep Convolutional Neural Networks with Attentive Pooling , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[3]  Lei Chen,et al.  End-to-End Neural Network Based Automated Speech Scoring , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  John H. L. Hansen,et al.  Advancing Multi-Accented Lstm-CTC Speech Recognition Using a Domain Specific Student-Teacher Learning Paradigm , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[5]  Diego Giuliani,et al.  Non-Native Children Speech Recognition Through Transfer Learning , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Mark J. F. Gales,et al.  Automatic Grammatical Error Detection of Non-native Spoken Learner English , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Daniele Falavigna,et al.  TLT-school: a Corpus of Non Native Children Speech , 2020, LREC.

[8]  Daniele Falavigna,et al.  Automatic Assessment of Spoken Language Proficiency of Non-native Children , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Jinsong Zhang,et al.  Improve the Accuracy of Non-native Speech Annotation with a Semi-automatic Approach , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[10]  John P. McCrae,et al.  A Survey of Current Datasets for Code-Switching Research , 2020, 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS).

[11]  Keelan Evanini,et al.  Bidirectional LSTM-RNN for Improving Automated Assessment of Non-Native Children's Speech , 2017, INTERSPEECH.

[12]  Anastassia Loukina,et al.  A comparison of ASR and human errors for transcription of non-native spontaneous speech , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Keelan Evanini,et al.  The influence of automatic speech recognition accuracy on the performance of an automated speech assessment system , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[14]  Diego Giuliani,et al.  DNN adaptation for recognition of children speech through automatic utterance selection , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[15]  Long Zhang,et al.  End-to-End Automatic Pronunciation Error Detection Based on Improved Hybrid CTC/Attention Architecture , 2020, Sensors.

[16]  Yiming Wang,et al.  Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.

[17]  Yiming Wang,et al.  Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks , 2018, INTERSPEECH.

[18]  Panayiotis G. Georgiou,et al.  Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations , 2018, Comput. Speech Lang..

[19]  Helmer Strik,et al.  Directions for the future of technology in pronunciation research and teaching , 2018, Journal of Second Language Pronunciation.

[20]  Tatsuya Kawahara,et al.  Cross-Lingual Transfer Learning of Non-Native Acoustic Modeling for Pronunciation Error Detection and Diagnosis , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Seongjin Park,et al.  A comparison between native and non-native speech for automatic speech recognition , 2019, The Journal of the Acoustical Society of America.

[22]  Mathew Magimai-Doss,et al.  Improving Children Speech Recognition through Feature Learning from Raw Speech Signal , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Xin Chen,et al.  Deep neural network acoustic models for spoken assessment applications , 2015, Speech Commun..

[24]  Geoffrey Zweig,et al.  Toward Human Parity in Conversational Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[26]  Yuan Gao,et al.  Spoken English Intelligibility Remediation with Pocketsphinx Alignment and Feature Extraction Improves Substantially Over the State of the Art , 2017, 2018 2nd IEEE Advanced Information Management,Communicates,Electronic and Automation Control Conference (IMCEC).