GA, MR, FFNN, PNN and GMM based models for automatic text summarization

This work proposes an approach to address the problem of improving content selection in automatic text summarization by using some statistical tools. This approach is a trainable summarizer, which takes into account several features, including sentence position, positive keyword, negative keyword, sentence centrality, sentence resemblance to the title, sentence inclusion of name entity, sentence inclusion of numerical data, sentence relative length, Bushy path of the sentence and aggregated similarity for each sentence to generate summaries. First, we investigate the effect of each sentence feature on the summarization task. Then we use all features in combination to train genetic algorithm (GA) and mathematical regression (MR) models to obtain a suitable combination of feature weights. Moreover, we use all feature parameters to train feed forward neural network (FFNN), probabilistic neural network (PNN) and Gaussian mixture model (GMM) in order to construct a text summarizer for each model. Furthermore, we use trained models by one language to test summarization performance in the other language. The proposed approach performance is measured at several compression rates on a data corpus composed of 100 Arabic political articles and 100 English religious articles. The results of the proposed approach are promising, especially the GMM approach.

[1]  Richard Williams,et al.  Review of Regression Models for Categorical Dependent Variables Using Stata, Second Edition, by Long and Freese , 2006 .

[2]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[3]  Dragomir R. Radev,et al.  Generating summaries of multiple news articles , 1995, SIGIR '95.

[4]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[5]  Tat-Seng Chua,et al.  Document concept lattice for text understanding and summarization , 2007, Inf. Process. Manag..

[6]  Jonas Sjöbergh,et al.  Older versions of the ROUGEeval summarization evaluation system were easier to fool , 2007, Inf. Process. Manag..

[7]  Sheryl R. Young,et al.  Automatic Classification and Summarization of Banking Telexes , 1985, CAIA.

[8]  Shingo Kuroiwa,et al.  Effects of Phoneme Type and Frequency on Distributed Speaker Identification and Verification , 2006, IEICE Trans. Inf. Syst..

[9]  Paul Over,et al.  DUC in context , 2007, Inf. Process. Manag..

[10]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[11]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[12]  Wei-Pang Yang,et al.  Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis , 2002, ICADL.

[13]  Jimmy J. Lin,et al.  Multi-candidate reduction: Sentence compression as a tool for document summarization tasks , 2007, Inf. Process. Manag..

[14]  Manabu Okumura,et al.  Supervised automatic evaluation for summarization with voted regression model , 2007, Inf. Process. Manag..

[15]  Robert J. Gaizauskas,et al.  Using Coreference Chains for Text Summarization , 1999, COREF@ACL.

[16]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[17]  Hongyan Jing,et al.  Using Hidden Markov Modeling to Decompose Human-Written Summaries , 2002, Computational Linguistics.

[18]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[19]  J.H.L. Hansen,et al.  An efficient scoring algorithm for Gaussian mixture model based speaker identification , 1998, IEEE Signal Processing Letters.

[20]  Bonnie J. Dorr,et al.  Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction , 2007, Inf. Process. Manag..

[21]  Marie-Francine Moens,et al.  Summarizing court decisions , 2007, Inf. Process. Manag..

[22]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .

[23]  Wei-Pang Yang,et al.  Text summarization using a trainable summarizer and latent semantic analysis , 2005, Inf. Process. Manag..

[24]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[25]  Karel Jezek,et al.  Two uses of anaphora resolution in summarization , 2007, Inf. Process. Manag..

[26]  Inderjeet Mani,et al.  The Challenges of Automatic Summarization , 2000, Computer.

[27]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[28]  Sanda M. Harabagiu,et al.  Satisfying information needs with multi-document summaries , 2007, Inf. Process. Manag..

[29]  Pablo Gervás,et al.  User-model based personalized summarization , 2007, Inf. Process. Manag..

[30]  Mary Ellen Okurowski,et al.  A Scalable Summarization System Using Robust NLP , 1997 .

[31]  Tadashi Nomoto,et al.  Discriminative sentence compression with conditional random fields , 2007, Inf. Process. Manag..

[32]  Yuji Matsumoto,et al.  A new approach to unsupervised text summarization , 2001, SIGIR '01.

[33]  Inderjeet Mani,et al.  Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[34]  Éva Tardos,et al.  Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields , 2002, JACM.

[35]  Dimitris K. Tasoulis,et al.  Locally recurrent probabilistic neural network for text-independent speaker verification , 2003, INTERSPEECH.

[36]  Ben Jann,et al.  Making Regression Tables from Stored Estimates , 2005 .

[37]  Shingo Kuroiwa,et al.  Sentence Alignment Using Feed Forward Neural Network , 2006, Int. J. Neural Syst..

[38]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[39]  Dianne P. O'Leary,et al.  Text summarization via hidden Markov models , 2001, SIGIR '01.

[40]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[41]  Zoubin Ghahramani,et al.  Towards semi-supervised classification with Markov random fields , 2002 .

[42]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[43]  Richard M. Schwartz,et al.  Task-based evaluation of text summarization using Relevance Prediction , 2007, Inf. Process. Manag..

[44]  Roger C. Schank,et al.  SCRIPTS, PLANS, GOALS, AND UNDERSTANDING , 1988 .

[45]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[46]  Hyoil Han,et al.  The use of domain-specific concepts in biomedical text summarization , 2007, Inf. Process. Manag..

[47]  Xin He,et al.  Generating gene summaries from biomedical literature: A study of semi-structured summarization , 2007, Inf. Process. Manag..