Speech segmentation and spoken document processing

Progress in both speech and language processing has spurred efforts to support applications that rely on spoken rather than written language input. A key challenge in moving from text-based documents to such spoken documents is that spoken language lacks explicit punctuation and formatting, which can be crucial for good performance. This article describes different levels of speech segmentation, approaches to automatically recovering segment boundary locations, and experimental results demonstrating impact on several language processing tasks. The results also show a need for optimizing segmentation for the end task rather than independently.

[1]  Richard M. Schwartz,et al.  The effects of speech recognition and punctuation on information extraction performance , 2005, INTERSPEECH.

[2]  Gerald Penn,et al.  Comparing the roles of textual, acoustic and spoken-language features on spontaneous-conversation summarization , 2006, NAACL.

[3]  Marcus Tomalin,et al.  Discriminatively Trained Gaussian Mixture Models for Sentence Boundary Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Hermann Ney,et al.  Automatic sentence segmentation and punctuation prediction for spoken language translation , 2006, IWSLT.

[5]  Mari Ostendorf,et al.  Parsing Conversational Speech Using Enhanced Segmentation , 2004, NAACL.

[6]  Hermann Ney,et al.  Discriminative Reordering Models for Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[7]  Dilek Z. Hakkani-Tür,et al.  Improving speech translation with automatic boundary prediction , 2007, INTERSPEECH.

[8]  Dilek Z. Hakkani-Tür,et al.  Punctuating speech for information extraction , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Patrick Nguyen,et al.  Finding Speaker Identities with a Conditional Maximum Entropy Model , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  Dilek Z. Hakkani-Tür,et al.  The ICSI+ multilingual sentence segmentation system , 2006, INTERSPEECH.

[11]  Geoffrey Zweig,et al.  Maximum entropy model for punctuation annotation from speech , 2002, INTERSPEECH.

[12]  Andreas Stolcke,et al.  Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Eugene Charniak,et al.  Edit Detection and Parsing for Transcribed Speech , 2001, NAACL.

[14]  Richard M. Schwartz,et al.  Integrating Speech Recognition and Machine Translation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[15]  Mary P. Harper,et al.  2005 Johns Hopkins Summer Workshop Final Report on Parsing and Spoken Structural Event Detection , 2005 .

[16]  Feifan Liu,et al.  Soundbite identification using reference and automatic transcripts of broadcast news speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[17]  Hermann Ney,et al.  The RWTH statistical machine translation system for the IWSLT 2006 evaluation , 2006, IWSLT.

[18]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[19]  Dilek Z. Hakkani-Tür,et al.  IMPACT OF AUTOMATIC COMMA PREDICTION ON POS/NAME TAGGING OF SPEECH , 2006, 2006 IEEE Spoken Language Technology Workshop.

[20]  Volume Assp,et al.  ACOUSTICS. SPEECH. AND SIGNAL PROCESSING , 1983 .

[21]  Andreas Stolcke,et al.  Automatic linguistic segmentation of conversational speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[22]  Ralph Grishman,et al.  NYU's English ACE 2005 System Description , 2005 .

[23]  Douglas A. Reynolds,et al.  Measuring human readability of machine generated text: three case studies in speech recognition and machine translation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[24]  Dilek Z. Hakkani-Tür,et al.  Prosodic Similarities of Dialog Act Boundaries Across Speaking Styles , 2008 .

[25]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[26]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Mary P. Harper,et al.  Reranking for Sentence Boundary Detection in Conversational Speech , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[28]  Julia Hirschberg,et al.  Varying Input Segmentation for Story Boundary Detection in English, Arabic and Mandarin Broadcast News , 2007 .

[29]  Sadaoki Furui,et al.  Automatic speech summarization applied to English broadcast news speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Heidi Christensen,et al.  From Text Summarisation to Style-Specific Summarisation for Broadcast News , 2004, ECIR.