论文信息 - Manual and Semi-manual Comparison in the Annotation of CAPT Speech Corpus

Manual and Semi-manual Comparison in the Annotation of CAPT Speech Corpus

Annotation plays an important role in a robust computer aided pronunciation training (CAPT) system. Manual annotation is time and annotator consuming. Automatic annotation is more efficient, but the accuracy is lower. This paper proposes to provide phoneme level labelling candidates with ASR models, and compare this method with manual annotation. In the previous work, phoneme level mispronunciation patterns were modelled and detected to provide readable pronunciation erroneous tendencies (PETs) feedback. In this paper, it is proposed to use this model to provide phoneme level labelling candidates, then annotators could choose the appropriate labels and make final decision. Experimental results show that compared with manual annotation, the consistency rate of semi-manual annotation increased from 88.52% to 92.88%. In addition, the false positive rate (FPR) reduced by 3%, the posterior F1 score declined more than 3%. The reported results demonstrated the efficiency of the proposed method.

Yanlu Xie | Wei Wang | Xing Wei

[1] David Miller,et al. The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.

[2] Lin-Shan Lee,et al. Improved approaches of modeling and detecting Error Patterns with empirical analysis for Computer-Aided Pronunciation Training , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Jean Carletta,et al. The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[4] Rong Tong,et al. iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent , 2015, INTERSPEECH.

[5] Wei Wang,et al. A study of automatic annotation of PETs with articulatory features , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[6] B. MacWhinney. The CHILDES project: tools for analyzing talk , 1992 .

[7] Yong Wang,et al. Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers , 2015, Speech Commun..

[8] Mark Liberman,et al. A formal framework for linguistic annotation , 1999, Speech Commun..