Speech Segmentation and its Impact on Spoken Document Processing

Progress in both speech and language processing has spurred efforts to support applications that rely on spoken—rather than written—language input. A key challenge in moving from text-based documents to such “spoken documents” is that spoken language lacks explicit punctuation and formatting, which can be crucial for good performance. This paper describes different levels of speech segmentation, approaches to automatically recovering segment boundary locations, and experimental results demonstrating impact on several language processing tasks. The results also show a need for optimizing segmentation for the end task rather than independently.

[1]  Heidi Christensen,et al.  From Text Summarisation to Style-Specific Summarisation for Broadcast News , 2004, ECIR.

[2]  Dilek Z. Hakkani-Tür,et al.  Improving speech translation with automatic boundary prediction , 2007, INTERSPEECH.

[3]  Richard M. Schwartz,et al.  The effects of speech recognition and punctuation on information extraction performance , 2005, INTERSPEECH.

[4]  Geoffrey Zweig,et al.  Maximum entropy model for punctuation annotation from speech , 2002, INTERSPEECH.

[5]  Andreas Stolcke,et al.  Automatic linguistic segmentation of conversational speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Hermann Ney,et al.  The RWTH statistical machine translation system for the IWSLT 2006 evaluation , 2006, IWSLT.

[7]  Eugene Charniak,et al.  Edit Detection and Parsing for Transcribed Speech , 2001, NAACL.

[8]  Mary P. Harper,et al.  2005 Johns Hopkins Summer Workshop Final Report on Parsing and Spoken Structural Event Detection , 2005 .

[9]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[10]  Dilek Z. Hakkani-Tür,et al.  Prosodic Similarities of Dialog Act Boundaries Across Speaking Styles , 2008 .

[11]  Dilek Z. Hakkani-Tür,et al.  The ICSI+ multilingual sentence segmentation system , 2006, INTERSPEECH.

[12]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[13]  Julia Hirschberg,et al.  Varying Input Segmentation for Story Boundary Detection in English, Arabic and Mandarin Broadcast News , 2007 .

[14]  Ralph Grishman,et al.  NYU's English ACE 2005 System Description , 2005 .

[15]  Mary P. Harper,et al.  Reranking for Sentence Boundary Detection in Conversational Speech , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16]  Gerald Penn,et al.  Comparing the roles of textual, acoustic and spoken-language features on spontaneous-conversation summarization , 2006, NAACL.

[17]  Feifan Liu,et al.  Soundbite identification using reference and automatic transcripts of broadcast news speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[18]  Dilek Z. Hakkani-Tür,et al.  IMPACT OF AUTOMATIC COMMA PREDICTION ON POS/NAME TAGGING OF SPEECH , 2006, 2006 IEEE Spoken Language Technology Workshop.

[19]  Douglas A. Reynolds,et al.  Measuring human readability of machine generated text: three case studies in speech recognition and machine translation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[20]  Patrick Nguyen,et al.  Finding Speaker Identities with a Conditional Maximum Entropy Model , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[21]  Marcus Tomalin,et al.  Discriminatively Trained Gaussian Mixture Models for Sentence Boundary Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[22]  Hermann Ney,et al.  Automatic sentence segmentation and punctuation prediction for spoken language translation , 2006, IWSLT.

[23]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Dilek Z. Hakkani-Tür,et al.  Punctuating speech for information extraction , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Hermann Ney,et al.  Discriminative Reordering Models for Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[26]  Mari Ostendorf,et al.  Parsing Conversational Speech Using Enhanced Segmentation , 2004, NAACL.

[27]  Andreas Stolcke,et al.  Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Richard M. Schwartz,et al.  Integrating Speech Recognition and Machine Translation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[29]  Sadaoki Furui,et al.  Automatic speech summarization applied to English broadcast news speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.