Shefce: A Cantonese-English bilingual speech corpus for pronunciation assessment

This paper introduces the development of ShefCE: a Cantonese-English bilingual speech corpus from L2 English speakers in Hong Kong. Bilingual parallel recording materials were chosen from TED online lectures. Script selection were carried out according to bilingual consistency (evaluated using a machine translation system) and the distribution balance of phonemes. 31 undergraduate to postgraduate students in Hong Kong aged 20–30 were recruited and recorded a 25-hour speech corpus (12 hours in Cantonese and 13 hours in English). Baseline phoneme/syllable recognition systems were trained on background data with and without the ShefCE training data. The final syllable error rate (SER) for Cantonese is 17.3% and final phoneme error rate (PER) for English is 34.5%. The automatic speech recognition performance on English showed a significant mismatch when applying L1 models on L2 data, suggesting the need for explicit accent adaptation. ShefCE and the corresponding baseline models will be made openly available for academic research.

[1]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[2]  Thomas Hain,et al.  Automatic assessment of English learner pronunciation using discriminative classifiers , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[4]  Shin'ichiro Ishikawa Design of the ICNALE-Spoken : A New Database for Multi-modal Contrastive Interlanguage Analysis , 2014 .

[5]  Maxine Eskénazi,et al.  An overview of spoken language technology for education , 2009, Speech Commun..

[6]  Satoshi Nakamura,et al.  Automatic pronunciation scoring of words and sentences independent from the non-native's first language , 2009, Comput. Speech Lang..

[7]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[8]  Denis Jouvet,et al.  Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process , 2014, LREC.

[9]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[10]  Yik-Cheung Tam,et al.  PLASER: Pronunciation Learning via Automatic Speech Recognition , 2003, HLT-NAACL 2003.

[11]  Xiaodong Cui,et al.  A high-performance Cantonese keyword search system , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  K. Almeman,et al.  Multi dialect Arabic speech parallel corpora , 2013, 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA).

[13]  Lucia Specia,et al.  The USFD SLT System for IWSLT 2014 , 2014 .

[14]  Hermann Ney,et al.  Multilingual MRASTA features for low-resource keyword search and speech recognition systems , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Vassilios Digalakis,et al.  Automatic pronunciation evaluation of foreign speakers using unknown text , 2007, Comput. Speech Lang..

[16]  Tan Lee,et al.  Spoken language resources for Cantonese speech processing , 2002, Speech Commun..

[17]  Tracey M. Derwing,et al.  Second Language Accent and Pronunciation Teaching: A Research- Based Approach. , 2005 .