Story Segmentation of Broadcast News in English, Mandarin and Arabic

In this paper, we present results from a Broadcast News story segmentation system developed for the SRI NIGHTINGALE system operating on English, Arabic and Mandarin news shows to provide input to subsequent question-answering processes. Using a rule-induction algorithm with automatically extracted acoustic and lexical features, we report success rates that are competitive with state-of-the-art systems on each input language. We further demonstrate that features useful for English and Mandarin are not discriminative for Arabic.

[1]  Gina-Anne Levow,et al.  Assessing Prosodic and Text Features for Segmentation of Mandarin Broadcast News , 2004, HLT-NAACL 2004.

[2]  Hideki Kozima,et al.  Text Segmentation Based on Similarity between Words , 1993, ACL.

[3]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[4]  Gökhan Tür,et al.  Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation , 2001, CL.

[5]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[6]  Cathy H. Wu,et al.  Two-stage story segmentation and detection on broadcast news using genetic algorithm , 2003 .

[7]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[8]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[9]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[10]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[11]  David D. Palmer,et al.  Feature Selection for Trainable Multilingual Broadcast News Segmentation , 2004, HLT-NAACL.

[12]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[13]  Julia Hirschberg,et al.  The Rules Behind Roles: Identifying Speaker Role in Radio Broadcasts , 2000, AAAI/IAAI.

[14]  Julia Hirschberg,et al.  Acoustic indicators of topic segmentation , 1998, ICSLP.

[15]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[16]  Nicola Stokes,et al.  Spoken and Written News Story Segmentation Using Lexical Chains , 2003, NAACL.

[17]  Shih-Fu Chang,et al.  News video story segmentation using fusion of multi-level multi-modal features in TRECVID 2003 , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Mark T. Maybury Discourse Cues for Broadcast News Segmentation , 1998, COLING-ACL.

[19]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[20]  Rebecca J. Passonneau,et al.  Discourse Segmentation by Human and Automated Means , 1997, CL.

[21]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[22]  Qi Tian,et al.  A Two-Level Multi-Modal Approach for Story Segmentation of Large News Video Corpus , 2003, TRECVID.

[23]  Charles L. Wayne Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation , 2000, LREC.

[24]  George Doddington The Topic Detection and Tracking Phase 2 (TDT2) evaluation plan , 1998 .