The complementary roles of non-verbal cues for Robust Pronunciation Assessment

Research on pronunciation assessment systems focuses on utilizing phonetic and phonological aspects of non-native (L2) speech, often neglecting the rich layer of information hidden within the non-verbal cues. In this study, we proposed a novel pronunciation assessment framework, IntraVerbalPA. % The framework innovatively incorporates both fine-grained frame- and abstract utterance-level non-verbal cues, alongside the conventional speech and phoneme representations. Additionally, we introduce ''Goodness of phonemic-duration'' metric to effectively model duration distribution within the framework. Our results validate the effectiveness of the proposed IntraVerbalPA framework and its individual components, yielding performance that either matches or outperforms existing research works.

[1]  Julia Hirschberg,et al.  MultiPA: a multi-task speech pronunciation assessment system for a closed and open response scenario , 2023, ArXiv.

[2]  Minhwa Chung,et al.  A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning , 2023, INTERSPEECH 2023.

[3]  Liyuan Wang,et al.  Exploiting Information From Native Data for Non-Native Automatic Pronunciation Assessment , 2023, 2022 IEEE Spoken Language Technology Workshop (SLT).

[4]  Tien-Hong Lo,et al.  3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment , 2022, 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[5]  James R. Glass,et al.  Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Eesung Kim,et al.  Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning , 2022, INTERSPEECH.

[7]  J. Hansen,et al.  Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment , 2022, INTERSPEECH.

[8]  David Jurgens,et al.  Phone-to-Audio Alignment without Text: A Semi-Supervised Approach , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Huayun Zhang,et al.  Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil , 2021, Interspeech.

[10]  Jinsong Zhang,et al.  Automatic Scoring at Multi-Granularity for L2 Pronunciation , 2020, INTERSPEECH.

[11]  Ronan Collobert,et al.  Unsupervised Cross-lingual Representation Learning for Speech Recognition , 2020, Interspeech.

[12]  Abdel-rahman Mohamed,et al.  wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.

[13]  Ricardo Gutierrez-Osuna,et al.  L2-ARCTIC: A Non-native English Speech Corpus , 2018, INTERSPEECH.

[14]  Helmer Strik,et al.  Speech Technologies and the Assessment of Second Language Speaking: Approaches, Challenges, and Opportunities , 2018, Language Assessment Quarterly.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[17]  Maxine Eskénazi,et al.  An overview of spoken language technology for education , 2009, Speech Commun..

[18]  Glenn Stockwell,et al.  Call Dimensions: Options and Issues in Computer Assisted Language Learning (ESL & Applied Linguistics Professional Series) , 2006 .

[19]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[20]  Frank K. Soong,et al.  An improved DNN-based approach to mispronunciation detection and diagnosis of L2 learners' speech , 2015, SLaTE.

[21]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[22]  Mike Levy,et al.  Call dimensions : options and issues in computer-assisted language learning , 2006 .