Improving LSA-based Summarization with Anaphora Resolution

We propose an approach to summarization exploiting both lexical information and the output of an automatic anaphoric resolver, and using Singular Value Decomposition (SVD) to identify the main terms. We demonstrate that adding anaphoric information results in significant performance improvements over a previously developed system, in which only lexical terms are used as the input to SVD. However, we also show that how anaphoric information is used is crucial: whereas using this information to add new terms does result in improved performance, simple substitution makes the performance worse.

[1]  Renata Vieira,et al.  An Empirically-based System for Processing Definite Descriptions , 2000, CL.

[2]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[3]  Roland Stuckardt,et al.  Coreference-Based Summarization and Question Answering: a Case for High Precision Anaphor Resolution , 2003 .

[4]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[5]  Breck Baldwin,et al.  Dynamic Coreference-Based Summarization , 1998, EMNLP.

[6]  Eduard H. Hovy,et al.  Automated Text Summarization and the SUMMARIST System , 1998, TIPSTER.

[7]  Ruslan Mitkov,et al.  Robust Pronoun Resolution with Limited Knowledge , 1998, ACL.

[8]  Karel Jezek,et al.  Text Summarization and Singular Value Decomposition , 2004, ADVIS.

[9]  Antonio Moreno-Sandoval,et al.  CROSSING BARRIERS IN TEXT SUMMARIZATION RESEARCH , 2005 .

[10]  Ralf Krestel,et al.  {Using Knowledge-poor Coreference Resolution for Text Summarization} , 2003 .

[11]  Johanna D. Moore,et al.  Latent Semantic Analysis for Text Segmentation , 2001, EMNLP.

[12]  Branimir K. Boguraev,et al.  Salience-based Content Characterisafion of Text Documents , 1997 .

[13]  Constantin Orasan,et al.  CAST: A computer-aided summarisation tool , 2003, EACL.

[14]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[15]  Josef Steinberger,et al.  Task-Based Evaluation of Anaphora Resolution : The Case of Summarization , 2005 .

[16]  Robert J. Gaizauskas,et al.  Using Coreference Chains for Text Summarization , 1999, COREF@ACL.

[17]  Michael Strube,et al.  MMAX: A Tool for the Annotation of Multi-modal Corpora , 2001, IJCAI 2001.

[18]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[19]  Massimo Poesio,et al.  A General-Purpose, Off-the-shelf Anaphora Resolution Module: Implementation and Preliminary Evaluation , 2004, LREC.

[20]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[21]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[22]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[23]  Constantin Orasan,et al.  Building better corpora for summarisation , 2003 .

[24]  Chris Mellish,et al.  Beyond Elaboration: The Interaction of Relations and Focus in Coherent Text , 2000 .

[25]  M. Kabadjov,et al.  DOES DISCOURSE-NEW DETECTION HELP DEFINITE DESCRIPTION RESOLUTION ? , 2004 .

[26]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[27]  Wilbert Spooren,et al.  Text representation : linguistic and psycholinguistic aspects , 2001 .

[28]  Barbara Di Eugenio,et al.  Centering: A Parametric Theory and Its Instantiations , 2004, Computational Linguistics.