Agreement Learning for Automatic Accent Annotation

Automatic accent annotation is important in both speech synthesis and speech recognition. Existing statistical learning algorithms rely heavily on a sufficiently large set of labeled training samples that are expensive and time consuming to collect. For unlabeled data, unsupervised learning can be initiated with a small set of manually labeled data. This paper shows that the accuracy of automatic accent annotation can be improved by augmenting a small amount of manually labeled data with a large pool of unlabeled data. We introduce an agreement-learning algorithm for this propose. Experimental results show that it is possible to reduce human-labeling effort significantly while reducing up to 50% errors.

[1]  M. Ostendorf,et al.  A bootstrapping approach to automating prosodic annotation for limited-domain synthesis , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[2]  Dilek Z. Hakkani-Tür,et al.  Semi-supervised learning for spoken language understanding semantic role labeling , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[3]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[4]  Shrikanth S. Narayanan,et al.  Combining acoustic, lexical, and syntactic evidence for automatic unsupervised prosody labeling , 2006, INTERSPEECH.

[5]  James R. Curran,et al.  Bootstrapping POS-taggers using unlabelled data , 2003, CoNLL.

[6]  Tatsuya Kawahara,et al.  Modeling and automatic detection of English sentence stress for computer-assisted English prosody learning system , 2002, INTERSPEECH.

[7]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[8]  Mark Hasegawa-Johnson,et al.  An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic-prosodic model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[10]  Giuseppe Riccardi,et al.  Prosody recognition from speech utterances using acoustic and linguistic based models of prosodic events , 1999, EUROSPEECH.

[11]  Fabrice Muhlenbach,et al.  Separability Index in Supervised Learning , 2002, PKDD.

[12]  Frank K. Soong,et al.  Automatic Accent Annotation with Limited Manually Labeled Data , 2005 .

[13]  E. Couper-Kuhlen An introduction to English prosody , 1986 .