Broadcast News Gisting Using Lexical Cohesion Analysis

In this paper we describe an extractive method of creating very short summaries or gists that capture the essence of a news story using a linguistic technique called lexical chaining. The recent interest in robust gisting and title generation techniques originates from a need to improve the indexing and browsing capabilities of interactive digital multimedia systems. More specifically these systems deal with streams of continuous data, like a news programme, that require further annotation before they can be presented to the user in a meaningful way. We automatically evaluate the performance of our lexical chaining-based gister with respect to four baseline extractive gisting methods on a collection of closed caption material taken from a series of news broadcasts. We also report results of a human-based evaluation of summary quality. Our results show that our novel lexical chaining approach to this problem outperforms standard extractive gisting methods.

[1]  Wessel Kraaij,et al.  Headline extraction based on a combination of uni- and multidocument summarization techniques , 2002 .

[2]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[3]  Vibhu O. Mittal,et al.  Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries (poster abstract). , 1998, SIGIR 1999.

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  Yllias Chali,et al.  The University of Lethbridge Text Summarizer at DUC 2002 , 2002 .

[6]  Kathleen F. McCoy,et al.  Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization , 2002, CL.

[7]  Richard M. Schwartz,et al.  Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation , 2003, HLT-NAACL 2003.

[8]  Rong Jin,et al.  A New Probabilistic Model for Title Generation , 2002, COLING.

[9]  Vibhu O. Mittal,et al.  Ultra-summarization (poster abstract): a statistical approach to generating highly condensed non-extractive summaries , 1999, SIGIR '99.

[10]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[11]  David Evans,et al.  The Columbia Multi-Document Summarizer for DUC 2002 , 2002 .

[12]  Joe Carthy,et al.  First Story Detection using a Composite Document Representation , 2001, HLT.

[13]  Enrique Alfonseca,et al.  Description of the UAM system for generating very short summaries at DUC-2004 ∗ , 2003 .

[14]  Jinxi Xu,et al.  The Design and Implementation of a Part of Speech Tagger for English , 1994 .

[15]  Vibhu O. Mittal,et al.  OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.

[16]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[17]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[18]  Alan F. Smeaton,et al.  Segmenting broadcast news streams using lexical chains , 2002 .

[19]  Graeme Hirst,et al.  Automatically generating hypertext by computing semantic similarity , 1997 .

[20]  M. Halliday Spoken and Written Language , 1989 .

[21]  Liang Zhou,et al.  Headline Summarization at ISI , 2003 .

[22]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[23]  Noel E. O'Connor,et al.  TV news story segmentation, personalisation and recommendation , 2003 .

[24]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[25]  R. Schwartz,et al.  Automatic Headline Generation for Newspaper Stories , 2002 .

[26]  Dragos Stefan Munteanu,et al.  GLEANS: A Generator of Logical Extracts and Abstracts for Nice Summaries , 2002 .

[27]  Okumura Manabu,et al.  Word Sense Disambiguation and Text Segmentation Based on Lexical Cohesion , 1994, COLING.

[28]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[29]  David St-Onge,et al.  Detecting and Correcting Malapropisms with Lexical Chains , 1995 .