An Automatic Text Summarization Approach using Content-Based and Graph-Based Characteristics

The continuing growth of World Wide Web and on-line text collections makes a large volume of information available to users. Automatic text summarization allows users to quickly understand documents. In this paper, we propose an automated technique for single document summarization which combines content-based and graph-based approaches and introduce the Hopfield network algorithm as a technique for ranking text segments. A series of experiments are performed using the DUC collection and a Thai-document collection. The results show the superiority of the proposed technique over reference systems, in addition the Hopfield network algorithm on undirected graph is shown to be the best text segment ranking algorithm in the study

[1]  Gerard Salton,et al.  Automatic text decomposition using text segments and text themes , 1996, HYPERTEXT '96.

[2]  Jihoon Yang,et al.  Extracting sentence segments for text summarization: a machine learning approach , 2000, SIGIR '00.

[3]  G Salton,et al.  Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts , 1994, Science.

[4]  Massih-Reza Amini,et al.  The use of unlabeled data to improve supervised learning for text summarization , 2002, SIGIR '02.

[5]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[6]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[7]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[8]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[9]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[10]  H. Chen,et al.  An Algorithmic Approach to Concept Exploration in a Large Knowledge Network (Automatic Thesaurus Consultation): Symbolic Branch-and-Bound Search vs. Connectionist Hopfield Net Activation , 1995, J. Am. Soc. Inf. Sci..

[11]  Hsinchun Chen,et al.  A graph-based recommender system for digital library , 2002, JCDL '02.

[12]  Wei-Ying Ma,et al.  Web-page classification through summarization , 2004, SIGIR '04.

[13]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[14]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[15]  Vijay V. Raghavan,et al.  Conceptual Retrieval based on Feature Clustering of Documents , 2002 .

[16]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[17]  Eduard Hovy,et al.  The Potential and Limitations of Sentence Extraction for Summarization , 2003 .

[18]  Michele Banko,et al.  Generating Extraction-Based Summaries from Hand-Written Summaries by Aligning Text Spans , 1999 .

[19]  Akitoshi Okumura,et al.  Trainable Automatic Text Summarization Using Segmentation of Sentence , 2002, NTCIR.

[20]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[21]  Inderjeet Mani,et al.  The Challenges of Automatic Summarization , 2000, Computer.

[22]  Eduard Hovy,et al.  The Potential and Limitations of Automatic Sentence Extraction for Summarization , 2003, HLT-NAACL 2003.

[23]  Peter D. Turney Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data , 2002, ArXiv.

[24]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[25]  Mary Ellen Okurowski,et al.  Trainable, Scalable Summarization Using Robust NLP and Machine Learning , 1998, ACL.