论文信息 - Use and Evaluation of Prosodic Annotations in Dutch

Use and Evaluation of Prosodic Annotations in Dutch

In the development of annotations for a spoken database, an important issue is whether the annotations can be generated automatically with sufficient precision, or whether expensive manual annotations are needed. In this paper, the case of prosodic annotations is discussed, which was investigated on the CGN database (Spoken Dutch Corpus). The main conclusions of this work are as follows. First, it was found that the available amount of manual prosodic annotations is sufficient for the development of our (baseline, decision tree based) prosodic models. In other words, more manual annotations do not improve the models. Second, the developed prosodic models for prominence are insufficiently accurate to produce automatic prominence annotations that are as good as the manual ones. But on the other hand the consistency between manual and automatic break annotations is as high as the inter-transcriber consistency for breaks. So given the current amount of manual break annotations, annotations for the remainder of the CGN database can be generated automatically with the same quality as the manual annotations.

Hugo Van hamme | Jacques Duchateau | Tim Ceyssens

[1] Marc Swerts,et al. Annotation of prominent words, prosodic boundaries and segmental lengthening by non-expert transcribers in the Spoken Dutch Corpus , 2002, LREC.

[2] Kris Demuynck,et al. Automatic generation of phonetic transcriptions for large speech corpora , 2002, INTERSPEECH.

[3] Patrick Wambacq,et al. An Improved Algorithm for the Automatic Segmentation of Speech Corpora , 2002, LREC.