Combining Multiple Features for Automatic Text Summarization through Machine Learning

In this paper we explore multiple features for extractive automatic summarization using machine learning. They account for SuPor-2 features, a supervised summarizer for Brazilian Portuguese, and graph-based features mirroring complex networks measures. Four different classifiers and automatic feature selection are explored. ROUGE is used for assessment of single-document summarization of news texts.

[1]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[2]  Lucia Helena Machado Rino,et al.  SuPor: An Environment for AS of Texts in Brazilian Portuguese , 2004, EsTAL.

[3]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[4]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[5]  L. da F. Costa,et al.  A generalized approach to complex networks , 2006 .

[6]  E. F. Skorochod'ko Adaptive Method of Automatic Abstracting and Indexing , 1971, IFIP Congress.

[7]  Rada Mihalcea,et al.  Language Independent Extractive Summarization , 2005, ACL.

[8]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[9]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  M. C. Monard,et al.  A Fractal Dimension Based Filter Algorithm to Select Features for Supervised Learning , 2006, IBERAMIA-SBIA.

[11]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[12]  Jaime Simão Sichman,et al.  Advances in Artificial Intelligence - IBERAMIA-SBIA 2006 , 2006 .

[13]  Marti A. Hearst TextTiling: A Quantitative Approach to Discourse , 1993 .

[14]  Vladimir Batagelj,et al.  Partitioning Approach to Visualization of Large Graphs , 1999, GD.

[15]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Alex Alves Freitas,et al.  Generating Text Summaries through the Relative Importance of Topics , 2000, IBERAMIA-SBIA.

[18]  Alex Alves Freitas,et al.  Automatic Text Summarization Using a Machine Learning Approach , 2002, SBIA.

[19]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[20]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[21]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[22]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[23]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[24]  Ian Witten,et al.  Data Mining , 2000 .

[25]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[26]  Luciano da Fontoura Costa,et al.  Beyond the average: Detecting global singular nodes from local features in complex networks , 2006, 1003.3084.

[27]  Alessandra Russo,et al.  Advances in Artificial Intelligence – SBIA 2004 , 2004, Lecture Notes in Computer Science.

[28]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[29]  Thiago Alexandre Salgueiro Pardo,et al.  A Comparison of Automatic Summarizers of Texts in Brazilian Portuguese , 2004, SBIA.

[30]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[31]  Lucia Helena Machado Rino,et al.  Selecting a Feature Set to Summarize Texts in Brazilian Portuguese , 2006, IBERAMIA-SBIA.

[32]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[33]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..