Automatic Text Summarization Using a Machine Learning Approach

In this paper we address the automatic summarization task. Recent research works on extractive-summary generation employ some heuristics, but few works indicate how to select the relevant features. We will present a summarization procedure based on the application of trainable Machine Learning algorithms which employs a set of features extracted directly from the original text. These features are of two kinds: statistical - based on the frequency of some elements in the text; and linguistic - extracted from a simplified argumentative structure of the text. We also present some computational results obtained with the application of our summarizer to some well known text databases, and we compare these results to some baseline summarization procedures.

[1]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[2]  Gustave J. Rath,et al.  The formation of abstracts by the selection of sentences , 1961 .

[3]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[4]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[5]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[9]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[10]  Lisa F. Rau,et al.  Automatic Condensation of Electronic Publications by Sentence Selection , 1995, Inf. Process. Manag..

[11]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[12]  Yaakov Yaari,et al.  Segmentation of Expository Texts by Hierarchical Agglomerative Clustering , 1997, ArXiv.

[13]  Chris Buckley,et al.  Automatic Text Summarization by Paragraph Extraction , 1997 .

[14]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[15]  Karen Spärck Jones Automatic summarising: factors and directions , 1998, ArXiv.

[16]  Inderjeet Mani,et al.  Machine Learning of Generic and User-Focused Summarization , 1998, AAAI/IAAI.

[17]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[18]  J Allan,et al.  Readings in information retrieval. , 1998 .

[19]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[20]  Inderjeet Mani,et al.  The Tipster Summac Text Summarization Evaluation , 1999, EACL.

[21]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[22]  Marc Moens,et al.  Argumentative Classification of Extracted Sentences as a First Step Towards Flexible Abstracting , 1999 .

[23]  Daniel Marcu,et al.  Discourse Trees Are Good Indicators of Importance in Text , 1999 .

[24]  Alex A. Freitas,et al.  Document Clustering and Text Summarization , 2000 .

[25]  M. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.