Multi document summarization based on news components using fuzzy cross-document relations

Online information is growing enormously day by day with the blessing of World Wide Web. Search engines often provide users with abundant collection of articles; in particular, news articles which are retrieved from different news sources reporting on the same event. In this work, we aim to produce high quality multi document news summaries by taking into account the generic components of a news story within a specific domain. We also present an effective method, named Genetic-Case Base Reasoning, to identify cross-document relations from un-annotated texts. Following that, we propose a new sentence scoring model based on fuzzy reasoning over the identified cross-document relations. The experimental findings show that the proposed approach performed better that the conventional graph based and cluster based approach.

[1]  Furu Wei,et al.  A document-sensitive graph model for multi-document summarization , 2010, Knowledge and Information Systems.

[2]  Pei-Chann Chang,et al.  A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification , 2011, Appl. Soft Comput..

[3]  Ebrahim H. Mamdani,et al.  An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller , 1999, Int. J. Hum. Comput. Stud..

[4]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  Shanmugasundaram Hariharan,et al.  Studies on Graph Based Approaches for Singleand Multi Document Summarizations , 2009 .

[7]  Gurpreet Singh Lehal,et al.  A Survey of Text Summarization Extractive Techniques , 2010 .

[8]  Khosrow Kaikhah Text Summarization Using Neural Networks , 2004 .

[9]  K. Thangavel,et al.  Fuzzy - Rough Feature Selection With Π- Membership Function For Mammogram Classification , 2012, ArXiv.

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Shanmugasundaram Hariharan,et al.  Enhanced graph based approach for multi document summarization , 2013, Int. Arab J. Inf. Technol..

[12]  Marie-Francine Moens,et al.  Information Extraction: Algorithms and Prospects in a Retrieval Context , 2006, The Information Retrieval Series.

[13]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[14]  Naomie Salim,et al.  Cross-document structural relationship identification using supervised machine learning , 2012, Appl. Soft Comput..

[15]  Ramiz M. Aliguliyev,et al.  CLUSTERING TECHNIQUES AND DISCRETE PARTICLE SWARM OPTIMIZATION ALGORITHM FOR MULTI‐DOCUMENT SUMMARIZATION , 2010, Comput. Intell..

[16]  Fumiyo Fukumoto,et al.  Multi-document Summarization Using Link Analysis Based on Rhetorical Relations between Sentences , 2011, CICLing.

[17]  Zerina Begum,et al.  Literature Review of Automatic Multiple Documents Text Summarization , 2013 .

[18]  David Evans,et al.  Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[19]  W. Paszkowicz,et al.  Genetic Algorithms, a Nature-Inspired Tool: A Survey of Applications in Materials Science and Related Fields: Part II , 2009 .

[20]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[21]  Zheng-Yu Niu,et al.  Multi-document Summarization Using a Clustering-Based Hybrid Strategy , 2006, AIRS.

[22]  Dejing Dou,et al.  Ontology-based information extraction: An introduction and a survey of current approaches , 2010, J. Inf. Sci..

[23]  Dragomir R. Radev A Common Theory of Information Fusion from Multiple Text Sources Step One: Cross-Document Structure , 2000, SIGDIAL Workshop.

[24]  Xiaojun Wan,et al.  An Exploration of Document Impact on Graph-Based Multi-Document Summarization , 2008, EMNLP.

[25]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[26]  Hiroya Takamura,et al.  Identifying Cross-Document Relations between Sentences , 2008, IJCNLP.

[27]  Dipanjan Das Andr,et al.  A Survey on Automatic Text Summarization , 2007 .

[28]  Dragomir R. Radev,et al.  Learning cross-document structural relationships using boosting , 2003, CIKM '03.

[29]  Ray Bareiss,et al.  Exemplar-Based Knowledge Acquisition: A Unified Approach to Concept Representation, Classification, and Learning , 1990 .

[30]  Yonggang Zhang,et al.  Co-clustering Sentences and Terms for Multi-document Summarization , 2011, CICLing.

[31]  Dragomir R. Radev,et al.  Combining Labeled and Unlabeled Data for Learning Cross-Document Structural Relationships , 2004, IJCNLP.

[32]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[33]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[34]  James M. Neal,et al.  Newswriting and Reporting , 1976 .

[35]  Xiaodan Xu,et al.  A New Sub-topics Clustering Method Based on Semi-supervised Learing , 2012, J. Comput..

[36]  Zhu Zhang,et al.  Towards CST-enhanced summarization , 2002, AAAI/IAAI.