Automatic Text Summarization (The state of the art 2007 and new challenges)

The headline of this paper names a research area originating from the late 50's but not loosing its popularity until the present time. Moreover, one of the most relevant today's problems caused by the rapid growth of the Web, which is called information overloading, has increased the necessity of more sophisticated and powerful summarizers. This paper shortly introduces a taxonomy of summarization methods and an overview of their principles from classical ones, over corpus based, to knowledge rich approaches. We consider various aspects which can affect their classification. A special attention is devoted to application of recent information reduction methods, based on algebraic transformations. Further, we introduce experiences with the development of our own summarizing method. Finally, some new ideas and a conception for the future of this field are mentioned.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Branimir K. Boguraev,et al.  Salience-based Content Characterisafion of Text Documents , 1997 .

[3]  Tamara G. Kolda,et al.  A semidiscrete matrix decomposition for latent semantic indexing information retrieval , 1998, TOIS.

[4]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[5]  Seiji Miike,et al.  Abstract Generation Based on Rhetorical Structure Extraction , 1994, COLING.

[6]  J. Steinberger,et al.  LSA-Based Multi-Document Summarization , 2007 .

[7]  Chin-Yew Lin,et al.  Automated Text Summarization , 2005, IJCNLP.

[8]  Josef Steinberger,et al.  Sentence Compression for the LSA-based Summarizer , 2006 .

[9]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[10]  Kathleen McKeown,et al.  Cut and Paste Based Text Summarization , 2000, ANLP.

[11]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[12]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[13]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[14]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[15]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[16]  Gerald Salton,et al.  Automatic text processing , 1988 .

[17]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[18]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[19]  David Reitter,et al.  The Embra System at DUC 2005: Query-oriented Multi-document Summarization with a Very Large Latent Semantic Space , 2005 .

[20]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[21]  Wei-Pang Yang,et al.  Text summarization using a trainable summarizer and latent semantic analysis , 2005, Inf. Process. Manag..

[22]  Chris H. Q. Ding,et al.  A probabilistic model for Latent Semantic Indexing , 2005, J. Assoc. Inf. Sci. Technol..

[23]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[24]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[25]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[26]  Karel Jezek,et al.  Text Summarization and Singular Value Decomposition , 2004, ADVIS.

[27]  Hongyan Jing,et al.  Sentence Reduction for Automatic Text Summarization , 2000, ANLP.

[28]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[29]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[30]  J. Kleinberg,et al.  Authoritative Soueces in a Hyper-linked Environment , 1998, SODA 1998.

[31]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[32]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[33]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[34]  Mirella Lapata,et al.  Discourse Chunking and its Application to Sentence Compression , 2005, HLT.

[35]  Daniel Marcu,et al.  From discourse structures to text summaries , 1997 .