Cross-Genre Feature Comparisons for Spoken Sentence Segmentation

Automatic sentence segmentation of spoken language is an important precursor to downstream natural language processing. Previous studies combine lexical and prosodic features, but can impose significant computational challenges because of the large size of feature sets. Little is understood about which features most benefit performance, particularly for speech data from different speaking styles. We compare sentence segmentation for speech from broadcast news versus natural multi-party meetings, using identical lexical and prosodic feature sets across genres. Results based on boosting and forward selection for this task show that (1) features sets can be reduced with little or no loss in performance, and (2) the contribution of different feature types differs significantly by genre. We conclude that more efficient approaches to sentence segmentation and similar tasks can be achieved, especially if genre differences are taken into account.

[1]  Richard M. Schwartz,et al.  The effects of speech recognition and punctuation on information extraction performance , 2005, INTERSPEECH.

[2]  Mary P. Harper,et al.  Reranking for Sentence Boundary Detection in Conversational Speech , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Sadaoki Furui,et al.  Automatic Sentence Segmentation of Speech for Automatic Summarization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Andreas Stolcke,et al.  Using MLP features in SRI's conversational speech recognition system , 2005, INTERSPEECH.

[5]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[6]  Andreas Stolcke,et al.  Structural metadata research in the EARS program , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[8]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[9]  Jirí Dokulil,et al.  Evaluation of SPARQL Queries Using Relational Databases , 2006, SEMWEB.

[10]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[11]  Barbara Peskin,et al.  TOWARDS ROBUST SPEAKER SEGMENTATION: THE ICSI-SRI FALL 2004 DIARIZATION SYSTEM , 2004 .

[12]  Gökhan Tür,et al.  MODEL ADAPTATION FOR SENTENCE SEGMENTATION FROM SPEECH , 2006, 2006 IEEE Spoken Language Technology Workshop.

[13]  Andreas Stolcke,et al.  Two experiments comparing reading with listening for human processing of conversational telephone speech , 2005, INTERSPEECH.

[14]  Jakub Yaghob,et al.  Semantic Web Infrastructure Using DataPile , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops.

[15]  Jaeyoung Lee,et al.  An Ambient Robot System Based on Sensor Network: Concept and Contents of Ubiquitous Robotic Space , 2007, International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies (UBICOMM'07).

[16]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.