Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm

Most of previous approaches to automatic prosodic event detection are based on supervised learning, relying on the availability of a corpus that is annotated with the prosodic labels of interest in order to train the classification models. However, creating such resources is an expensive and time-consuming task. In this paper, we exploit semi-supervised learning with the co-training algorithm for automatic detection of coarse level representation of prosodic events such as pitch accents, intonational phrase boundaries, and break indices. We propose a confidence-based method to assign labels to unlabeled data and demonstrate improved results using this method compared to the widely used agreement-based method. In addition, we examine various informative sample selection methods. In our experiments on the Boston University radio news corpus, using only a small amount of the labeled data as the initial training set, our proposed labeling method combined with most confidence sample selection can effectively use unlabeled data to improve performance and finally reach performance closer to that of the supervised method using all the training data.

[1]  Stephen R. Clark,et al.  CLSP WS-02 Final Report: Semi-Supervised Training for Statistical Parsing , 2003 .

[2]  Craig A. Knoblock,et al.  Selective Sampling with Redundant Views , 2000, AAAI/IAAI.

[3]  Yang Liu,et al.  Automatic prosodic events detection using syllable-based acoustic and syntactic features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Sanjoy Dasgupta,et al.  PAC Generalization Bounds for Co-training , 2001, NIPS.

[5]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[6]  Wen Wang,et al.  Semi-Supervised Learning for Part-of-Speech Tagging of Mandarin Transcribed Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  James R. Curran,et al.  Bootstrapping POS-taggers using unlabelled data , 2003, CoNLL.

[8]  Gina-Anne Levow,et al.  Unsupervised and Semi-supervised Learning of Tone and Pitch Accent , 2006, NAACL.

[9]  Shrikanth S. Narayanan,et al.  Combining acoustic, lexical, and syntactic evidence for automatic unsupervised prosody labeling , 2006, INTERSPEECH.

[10]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[11]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[12]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[13]  Shrikanth S. Narayanan,et al.  Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Shrikanth S. Narayanan,et al.  Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[16]  Mark Hasegawa-Johnson,et al.  An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic-prosodic model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..