Speech and Spoken Document

rogress in both speech and language processing has spurred efforts to sup-port applications that rely on spoken—rather than written—language input.A key challenge in moving from text-based documents to such “spoken doc-uments” is that spoken language lacks explicit punctuation and formatting,which can be crucial for good performance. This article describes differentlevels of speech segmentation, approaches to automatically recovering segment bound-ary locations, and experimental results demonstrating impact on several language pro-cessing tasks. The results also show a need for optimizing segmentation for the endtask rather than independently.

[1]  Sadaoki Furui,et al.  Automatic speech summarization applied to English broadcast news speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Heidi Christensen,et al.  From Text Summarisation to Style-Specific Summarisation for Broadcast News , 2004, ECIR.

[3]  Julia Hirschberg,et al.  Varying Input Segmentation for Story Boundary Detection in English, Arabic and Mandarin Broadcast News , 2007 .

[4]  Mary P. Harper,et al.  Reranking for Sentence Boundary Detection in Conversational Speech , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Hermann Ney,et al.  The RWTH statistical machine translation system for the IWSLT 2006 evaluation , 2006, IWSLT.

[6]  Patrick Nguyen,et al.  Finding Speaker Identities with a Conditional Maximum Entropy Model , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Dilek Z. Hakkani-Tür,et al.  The ICSI+ multilingual sentence segmentation system , 2006, INTERSPEECH.

[8]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Mary P. Harper,et al.  2005 Johns Hopkins Summer Workshop Final Report on Parsing and Spoken Structural Event Detection , 2005 .

[10]  Feifan Liu,et al.  Soundbite identification using reference and automatic transcripts of broadcast news speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[11]  Geoffrey Zweig,et al.  Maximum entropy model for punctuation annotation from speech , 2002, INTERSPEECH.

[12]  Marcus Tomalin,et al.  Discriminatively Trained Gaussian Mixture Models for Sentence Boundary Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  Hermann Ney,et al.  Automatic sentence segmentation and punctuation prediction for spoken language translation , 2006, IWSLT.

[14]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[15]  Douglas A. Reynolds,et al.  Measuring human readability of machine generated text: three case studies in speech recognition and machine translation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[17]  Andreas Stolcke,et al.  Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Richard M. Schwartz,et al.  Integrating Speech Recognition and Machine Translation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[19]  Dilek Z. Hakkani-Tür,et al.  IMPACT OF AUTOMATIC COMMA PREDICTION ON POS/NAME TAGGING OF SPEECH , 2006, 2006 IEEE Spoken Language Technology Workshop.

[20]  Eugene Charniak,et al.  Edit Detection and Parsing for Transcribed Speech , 2001, NAACL.

[21]  Andreas Stolcke,et al.  Automatic linguistic segmentation of conversational speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[22]  Richard M. Schwartz,et al.  The effects of speech recognition and punctuation on information extraction performance , 2005, INTERSPEECH.

[23]  Gerald Penn,et al.  Comparing the roles of textual, acoustic and spoken-language features on spontaneous-conversation summarization , 2006, NAACL.

[24]  Mari Ostendorf,et al.  Parsing Conversational Speech Using Enhanced Segmentation , 2004, NAACL.

[25]  Dilek Z. Hakkani-Tür,et al.  Punctuating speech for information extraction , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Hermann Ney,et al.  Discriminative Reordering Models for Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[27]  Dilek Z. Hakkani-Tür,et al.  Improving speech translation with automatic boundary prediction , 2007, INTERSPEECH.