Multi-document summarization based on the Yago ontology

Sentence-based multi-document summarization is the task of generating a succinct summary of a document collection, which consists of the most salient document sentences. In recent years, the increasing availability of semantics-based models (e.g., ontologies and taxonomies) has prompted researchers to investigate their usefulness for improving summarizer performance. However, semantics-based document analysis is often applied as a preprocessing step, rather than integrating the discovered knowledge into the summarization process. This paper proposes a novel summarizer, namely Yago-based Summarizer, that relies on an ontology-based evaluation and selection of the document sentences. To capture the actual meaning and context of the document sentences and generate sound document summaries, an established entity recognition and disambiguation step based on the Yago ontology is integrated into the summarization process. The experimental results, which were achieved on the DUC'04 benchmark collections, demonstrate the effectiveness of the proposed approach compared to a large number of competitors as well as the qualitative soundness of the generated summaries.

[1]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[2]  Juan-Zi Li,et al.  Social context summarization , 2011, SIGIR.

[3]  M. Saravanan,et al.  Identification of Rhetorical Roles for Segmentation and Summarization of a Legal Judgment , 2010, Artificial Intelligence and Law.

[4]  Xiaojun Wan,et al.  Improved Affinity Graph Based Multi-Document Summarization , 2006, NAACL.

[5]  Sun Park,et al.  Query-Based Multi-Document Summarization Using Non-Negative Semantic Feature and NMF Clustering , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.

[6]  Chun Chen,et al.  Tag-oriented document summarization , 2009, WWW '09.

[7]  Vasileios Hatzivassiloglou,et al.  A Formal Model for Information Selection in Multi-Sentence Text Extraction , 2004, COLING.

[8]  Fernando Pereira,et al.  Generating summary keywords for emails using topics , 2008, IUI '08.

[9]  A.A. Mohamed,et al.  Improving Query-Based Summarization Using Document Graphs , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[10]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[11]  Raymond Y. K. Lau,et al.  Toward a Fuzzy Domain Ontology Extraction Method for Adaptive e-Learning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[12]  Josef Steinberger,et al.  JRC's Participation at TAC 2011: Guided and MultiLingual Summarization Tasks , 2011, TAC.

[13]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[14]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[15]  Tao Li,et al.  Ontology-enriched multi-document summarization in disaster management , 2010, SIGIR.

[16]  Rasim M. Alguliyev,et al.  Multiple documents summarization based on evolutionary optimization algorithm , 2013, Expert Syst. Appl..

[17]  Christopher Town,et al.  Ontological inference for image and video analysis , 2006, Machine Vision and Applications.

[18]  Dianne P. O'Leary,et al.  CLASSY 2011 at TAC: Guided and Multi-lingual Summaries and Evaluation Metrics , 2011, TAC.

[19]  Dunja Mladenic,et al.  Semantic Knowledge Management: Integrating Ontology Management, Knowledge Discovery, and Human Language Technologies , 2008 .

[20]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[21]  Yihong Gong,et al.  Integrating Document Clustering and Multidocument Summarization , 2011, TKDD.

[22]  Rasim M. Alguliyev,et al.  CDDS: Constraint-driven document summarization models , 2013, Expert Syst. Appl..

[23]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[24]  Robert Wetzker,et al.  An Ontology-Based Approach to Text Summarization , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[25]  Joel Nothman,et al.  Evaluating Entity Linking with Wikipedia , 2013, Artif. Intell..

[26]  John Atkinson,et al.  Rhetorics-based multi-document summarization , 2013, Expert Syst. Appl..

[27]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[28]  Luca Cagliero,et al.  Multi-document summarization exploiting frequent itemsets , 2012, SAC '12.

[29]  John M. Conroy Left-Brain/Right-Brain Multi-Document Summarization , 2004 .

[30]  Elena Baralis,et al.  Summarizing biological literature with BioSumm , 2010, CIKM '10.

[31]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[32]  Chao-Lin Liu,et al.  Ontology-based Text Summarization for Business News Articles , 2003, CATA.

[33]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[34]  Mohammad Saniee Abadeh,et al.  Automated Text Summarization Base on Lexicales Chain and graph Using of WordNet and Wikipedia Knowledge Base , 2012, ArXiv.

[35]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[36]  Tao Li,et al.  Document update summarization using incremental hierarchical clustering , 2010, CIKM.

[37]  M. B. Chandak,et al.  Graph-Based Algorithms for Text Summarization , 2010, 2010 3rd International Conference on Emerging Trends in Engineering and Technology.

[38]  A. Kogilavani,et al.  Ontology Enhanced Clustering Based Summarization of Medical Documents , 2009 .

[39]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[40]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[41]  Vivi Nastase,et al.  Topic-Driven Multi-Document Summarization with Encyclopedic Knowledge and Spreading Activation , 2008, EMNLP.

[42]  Luís Fernando Fortes Garcia,et al.  Using Ontological Modeling in a Context-Aware Summarization System to Adapt Text for Mobile Devices , 2006, Active Conceptual Modeling of Learning.

[43]  Hiroya Takamura,et al.  Text summarization model based on the budgeted median problem , 2009, CIKM.

[44]  Giuseppe Carenini,et al.  Summarizing email conversations with clue words , 2007, WWW '07.

[45]  Rasim M. Alguliyev,et al.  GenDocSum + MCLR: Generic document summarization based on maximum coverage and less redundancy , 2012, Expert Syst. Appl..

[46]  Ping Chen,et al.  A Query-Based Medical Information Summarization System Using Ontology Knowledge , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[47]  Dunja Mladenic,et al.  Capturing Document Semantics for Ontology Generation and Document Summarization , 2009, Semantic Knowledge Management.

[48]  Hideaki Takeda,et al.  Ontology Extraction by Collaborative Tagging with Social Networking , 2008 .