Using syntactic information to extract relevant terms for multi-document summarization

The identification of the key concepts in a set of documents is a useful source of information for several information access applications. We are interested in its application to multi-document summarization, both for the automatic generation of summaries and for interactive summarization systems.In this paper, we study whether the syntactic position of terms in the texts can be used to predict which terms are good candidates as key concepts. Our experiments show that a) distance to the verb is highly correlated with the probability of a term being part of a key concept; b) subject modifiers are the best syntactic locations to find relevant terms; and c) in the task of automatically finding key terms, the combination of statistical term weights with shallow syntactic information gives better results than statistical measures alone.

[1]  Stephen E. Robertson,et al.  Okapi at TREC , 1992, TREC.

[2]  Andreas Paepcke,et al.  Seeing the whole in parts: text summarization for web browsing on handheld devices , 2001, WWW '01.

[3]  Paul Rayson,et al.  Comparing Corpora using Frequency Profiling , 2000, Proceedings of the workshop on Comparing corpora -.

[4]  James W. Cooper,et al.  ASHRAM: active summarization and Markup , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[5]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[6]  B. Boguraev Dynamic presentation of document content for rapid on-line skimming , 1998, AAAI 1998.

[7]  Wessel Kraaij,et al.  Headline extraction based on a combination of uni- and multidocument summarization techniques , 2002 .

[8]  Eduard Hovy,et al.  NeATS in DUC 2002 , 2002 .

[9]  Journal of the Association for Computing Machinery , 1961, Nature.

[10]  Carol Peters,et al.  Evaluation of Cross-Language Information Retrieval Systems , 2002, Lecture Notes in Computer Science.

[11]  Julio Gonzalo,et al.  An Empirical Study of Information Synthesis Task , 2004, ACL.

[12]  Julio Gonzalo,et al.  Terminology Retrieval: Towards a Synergy between Thesaurus and Free Text Searching , 2002, IBERAMIA.

[13]  Anton Leuski,et al.  iNeATS: Interactive Multi-Document Summarization , 2003, ACL.

[14]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[15]  Gordon W. Paynter,et al.  Interactive document summarisation using automatically extracted keyphrases , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[16]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[17]  Eduard H. Hovy,et al.  Identifying Topics by Position , 1997, ANLP.