Multi-View Semi-Supervised Learning for Dialog Act Segmentation of Speech

Sentence segmentation of speech aims at determining sentence boundaries in a stream of words as output by the speech recognizer. Typically, statistical methods are used for sentence segmentation. However, they require significant amounts of labeled data, preparation of which is time-consuming, labor-intensive, and expensive. This work investigates the application of multi-view semi-supervised learning algorithms on the sentence boundary classification problem by using lexical and prosodic information. The aim is to find an effective semi-supervised machine learning strategy when only small sets of sentence boundary-labeled data are available. We especially focus on two semi-supervised learning approaches, namely, self-training and co-training. We also compare different example selection strategies for co-training, namely, agreement and disagreement. Furthermore, we propose another method, called self-combined, which is a combination of self-training and co-training. The experimental results obtained on the ICSI Meeting (MRDA) Corpus show that both multi-view methods outperform self-training, and the best results are obtained using co-training alone. This study shows that sentence segmentation is very appropriate for multi-view learning since the data sets can be represented by two disjoint and redundantly sufficient feature sets, namely, using lexical and prosodic information. Performance of the lexical and prosodic models is improved by 26% and 11% relative, respectively, when only a small set of manually labeled examples is used. When both information sources are combined, the semi-supervised learning methods improve the baseline F-Measure of 69.8% to 74.2%.

[1]  Gökhan Tür,et al.  MODEL ADAPTATION FOR SENTENCE SEGMENTATION FROM SPEECH , 2006, 2006 IEEE Spoken Language Technology Workshop.

[2]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[3]  Brian Roark,et al.  Unsupervised language model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Elizabeth Shriberg,et al.  Speaker adaptation of language models for automatic dialog act segmentation of meetings , 2007, INTERSPEECH.

[5]  Giuseppe Riccardi,et al.  On-line learning of language models with word error probability distributions , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[7]  Gökhan Tür,et al.  Unsupervised and active learning in automatic speech recognition for call classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Elizabeth Shriberg,et al.  Automatic dialog act segmentation and classification in multiparty meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Gökhan Tür,et al.  Co-training using prosodic and lexical information for sentence segmentation , 2007, INTERSPEECH.

[10]  Rada Mihalcea,et al.  Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[11]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[12]  Andreas Stolcke,et al.  Using Conditional Random Fields for Sentence Boundary Detection in Speech , 2005, ACL.

[13]  Dilek Z. Hakkani-Tür,et al.  The ICSI+ multilingual sentence segmentation system , 2006, INTERSPEECH.

[14]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[15]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[16]  Elizabeth Shriberg,et al.  Using Prosody for Automatic Sentence Segmentation of Multi-party Meetings , 2006, TSD.

[17]  Wen Wang,et al.  Semi-Supervised Learning for Part-of-Speech Tagging of Mandarin Transcribed Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[18]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[19]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[20]  Tom Michael Mitchell,et al.  The Role of Unlabeled Data in Supervised Learning , 2004 .

[21]  Stan Matwin,et al.  Email classification with co-training , 2011, CASCON.

[22]  Mark G. Core,et al.  Coding Dialogs with the DAMSL Annotation Scheme , 1997 .

[23]  Andreas Stolcke,et al.  Statistical language modeling for speech disfluencies , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.