An Automatic Thai Text Summarization Using Topic Sensitive PageRank

The continuing growth of World Wide Web and on-line text collections makes a large volume of information available to users. Automatic text summarization allows users to quickly understand documents. In this paper, we propose an automated technique for single document summary extraction in Thai language which combines content-based and graph-based features and introduce the Topic Sensitive PageRank algorithm as a technique for ranking text segments. A series of experiments are performed using a Thai document collection. The results show the superiority of the proposed technique over reference systems

[1]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[2]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[3]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[4]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[5]  Vijay V. Raghavan,et al.  Conceptual Retrieval based on Feature Clustering of Documents , 2002 .

[6]  H. Chen,et al.  An Algorithmic Approach to Concept Exploration in a Large Knowledge Network (Automatic Thesaurus Consultation): Symbolic Branch-and-Bound Search vs. Connectionist Hopfield Net Activation , 1995, J. Am. Soc. Inf. Sci..

[7]  G Salton,et al.  Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts , 1994, Science.

[8]  Gerard Salton,et al.  Automatic text decomposition using text segments and text themes , 1996, HYPERTEXT '96.

[9]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[10]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[11]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[12]  Christos Faloutsos,et al.  GCap: Graph-based Automatic Image Captioning , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[13]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[14]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[15]  Inderjeet Mani,et al.  The Challenges of Automatic Summarization , 2000, Computer.

[16]  Michele Banko,et al.  Generating Extraction-Based Summaries from Hand-Written Summaries by Aligning Text Spans , 1999 .

[17]  Akitoshi Okumura,et al.  Trainable Automatic Text Summarization Using Segmentation of Sentence , 2002, NTCIR.

[18]  Hsinchun Chen,et al.  A graph-based recommender system for digital library , 2002, JCDL '02.

[19]  Wei-Ying Ma,et al.  Web-page classification through summarization , 2004, SIGIR '04.

[20]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[21]  Eduard Hovy,et al.  The Potential and Limitations of Automatic Sentence Extraction for Summarization , 2003, HLT-NAACL 2003.

[22]  Peter D. Turney Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data , 2002, ArXiv.

[23]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[24]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[25]  Mary Ellen Okurowski,et al.  Trainable, Scalable Summarization Using Robust NLP and Machine Learning , 1998, ACL.

[26]  O. Sornil,et al.  An Automatic Text Summarization Approach using Content-Based and Graph-Based Characteristics , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[27]  Eduard Hovy,et al.  The Potential and Limitations of Sentence Extraction for Summarization , 2003 .

[28]  Jihoon Yang,et al.  Extracting sentence segments for text summarization: a machine learning approach , 2000, SIGIR '00.

[29]  Massih-Reza Amini,et al.  The use of unlabeled data to improve supervised learning for text summarization , 2002, SIGIR '02.

[30]  Torsten Suel,et al.  I/O-efficient techniques for computing pagerank , 2002, CIKM '02.

[31]  Ohm Sornil,et al.  Combining Prediction by Partial Matching and Logistic Regression for Thai Word Segmentation , 2004, COLING.