Speaker adaptation of language and prosodic models for automatic dialog act segmentation of speech

Speaker-dependent modeling has a long history in speech recognition, but has received less attention in speech understanding. This study explores speaker-specific modeling for the task of automatic segmentation of speech into dialog acts (DAs), using a linear combination of speaker-dependent and speaker-independent language and prosodic models. Data come from 20 frequent speakers in the ICSI meeting corpus; adaptation data per speaker ranges from 5 k to 115 k words. We compare performance for both reference transcripts and automatic speech recognition output. We find that: (1) speaker adaptation in this domain results both in a significant overall improvement and in improvements for many individual speakers, (2) the magnitude of improvement for individual speakers does not depend on the amount of adaptation data, and (3) language and prosodic models differ both in degree of improvement, and in relative benefit for specific DA classes. These results suggest important future directions for speaker-specific modeling in spoken language understanding tasks.

[1]  Larry P. Heck,et al.  Modeling dynamic prosodic variation for speaker verification , 1998, ICSLP.

[2]  Dilek Z. Hakkani-Tür,et al.  Improving speech translation with automatic boundary prediction , 2007, INTERSPEECH.

[3]  Matthias Zimmermann,et al.  Joint segmentation and classification of dialog acts using conditional random fields , 2009, INTERSPEECH.

[4]  Douglas A. Reynolds,et al.  Measuring the readability of automatic speech-to-text transcripts , 2003, INTERSPEECH.

[5]  Elizabeth Shriberg,et al.  On speaker-specific prosodic models for automatic dialog act segmentation of multi-party meetings , 2006, INTERSPEECH.

[6]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[7]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[8]  Gökhan Tür,et al.  MODEL ADAPTATION FOR SENTENCE SEGMENTATION FROM SPEECH , 2006, 2006 IEEE Spoken Language Technology Workshop.

[9]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[10]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[11]  Dilek Z. Hakkani-Tür,et al.  Entropy Based Classifier Combination for Sentence Segmentation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Gökhan Tür,et al.  Unsupervised Languagemodel Adaptation for Meeting Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Mari Ostendorf,et al.  A Hierarchical Stochastic Model for Automatic Prediction of Prosodic Boundary Location , 1994, CL.

[14]  Gökhan Tür,et al.  Automatic detection of sentence boundaries and disfluencies based on recognized words , 1998, ICSLP.

[15]  Elmar Nöth,et al.  Integrated dialog act segmentation and classification using prosodic features and language models , 1997, EUROSPEECH.

[16]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[17]  Tatsuya Kawahara,et al.  Language Model Adaptation Based on PLSA of Topics and Speakers for Automatic Transcription of Panel Discussions , 2003, IEICE Trans. Inf. Syst..

[18]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Mary P. Harper,et al.  Reranking for Sentence Boundary Detection in Conversational Speech , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  Ji-Hwan Kim,et al.  A combined punctuation generation and speech recognition system and its performance enhancement using prosody , 2003, Speech Commun..

[21]  Dilek Z. Hakkani-Tür,et al.  The ICSI+ multilingual sentence segmentation system , 2006, INTERSPEECH.

[22]  Elizabeth Shriberg,et al.  Using Prosody for Automatic Sentence Segmentation of Multi-party Meetings , 2006, TSD.

[23]  Sadaoki Furui,et al.  Speech-to-text and speech-to-speech summarization of spontaneous speech , 2004, IEEE Transactions on Speech and Audio Processing.

[24]  Mari Ostendorf,et al.  Parsing Conversational Speech Using Enhanced Segmentation , 2004, NAACL.

[25]  Geoffrey Zweig,et al.  Maximum entropy model for punctuation annotation from speech , 2002, INTERSPEECH.

[26]  Dilek Z. Hakkani-Tür,et al.  Efficient sentence segmentation using syntactic features , 2008, 2008 IEEE Spoken Language Technology Workshop.

[27]  Elizabeth Shriberg,et al.  Meeting Recorder Project: Dialog Act Labeling Guide , 2004 .

[28]  Andreas Stolcke,et al.  Recent innovations in speech-to-text transcription at SRI-ICSI-UW , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Tatsuya Kawahara,et al.  Sentence boundary detection of spontaneous Japanese using statistical language model and support vector machines , 2006, INTERSPEECH.

[30]  Andreas Stolcke,et al.  Comparing and Combining Generative and Posterior Probability Models: Some Advances in Sentence Boundary Detection in Speech , 2004, EMNLP.

[31]  Andreas Stolcke,et al.  A study in machine learning from imbalanced data for sentence boundary detection in speech , 2006, Comput. Speech Lang..

[32]  Stefan Besling,et al.  Language model speaker adaptation , 1995, EUROSPEECH.

[33]  Elizabeth Shriberg,et al.  Speaker adaptation of language models for automatic dialog act segmentation of meetings , 2007, INTERSPEECH.

[34]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[35]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[36]  Andreas Stolcke,et al.  Using Conditional Random Fields for Sentence Boundary Detection in Speech , 2005, ACL.

[37]  Amit Srivastava,et al.  Sentence boundary detection in arabic speech , 2003, INTERSPEECH.