Using Thematic Information in Statistical Headline Generation

We explore the problem of single sentence summarisation. In the news domain, such a summary might resemble a headline. The headline generation system we present uses Singular Value Decomposition (SVD) to guide the generation of a headline towards the theme that best represents the document to be summarised. In doing so, the intuition is that the generated summary will more accurately reflect the content of the source document. This paper presents SVD as an alternative method to determine if a word is a suitable candidate for inclusion in the headline. The results of a recall based evaluation comparing three different strategies to word selection, indicate that thematic information does help improve recall.

[1]  R. Schwartz,et al.  Automatic Headline Generation for Newspaper Stories , 2002 .

[2]  Branimir Boguraev,et al.  Discourse segmentation in aid of document summarization , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[3]  Lisa F. Rau,et al.  Automatic Condensation of Electronic Publications by Sentence Selection , 1995, Inf. Process. Manag..

[4]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[5]  Dragomir R. Radev,et al.  Generating Natural Language Summaries from Multiple On-Line Sources , 1998, CL.

[6]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[7]  Kathleen McKeown,et al.  The decomposition of human-written summary sentences , 1999, SIGIR '99.

[8]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[9]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[10]  Rong Jin,et al.  Learning to Select Good Title Words: An New Approach based on Reverse Information Retrieval , 2001, ICML.

[11]  Eduard H. Hovy,et al.  Automated Text Summarization and the SUMMARIST System , 1998, TIPSTER.

[12]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[13]  Michele Banko,et al.  Headline Generation Based on Statistical Translation , 2000, ACL.

[14]  Vibhu O. Mittal,et al.  Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries (poster abstract). , 1998, SIGIR 1999.

[15]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[16]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[17]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[18]  Vibhu O. Mittal,et al.  Ultra-summarization (poster abstract): a statistical approach to generating highly condensed non-extractive summaries , 1999, SIGIR '99.

[19]  Harold Borko,et al.  Abstracting Concepts and Methods , 1975 .

[20]  Rong Jin,et al.  Title language model for information retrieval , 2002, SIGIR '02.

[21]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.