Manual and Semi-manual Comparison in the Annotation of CAPT Speech Corpus

Annotation plays an important role in a robust computer aided pronunciation training (CAPT) system. Manual annotation is time and annotator consuming. Automatic annotation is more efficient, but the accuracy is lower. This paper proposes to provide phoneme level labelling candidates with ASR models, and compare this method with manual annotation. In the previous work, phoneme level mispronunciation patterns were modelled and detected to provide readable pronunciation erroneous tendencies (PETs) feedback. In this paper, it is proposed to use this model to provide phoneme level labelling candidates, then annotators could choose the appropriate labels and make final decision. Experimental results show that compared with manual annotation, the consistency rate of semi-manual annotation increased from 88.52% to 92.88%. In addition, the false positive rate (FPR) reduced by 3%, the posterior F1 score declined more than 3%. The reported results demonstrated the efficiency of the proposed method.