The notion of diversity in graphical entity summarisation on semantic knowledge graphs

Given an entity represented by a single node q in semantic knowledge graph D, the Graphical Entity Summarisation problem (GES) consists in selecting out of D a very small surrounding graph S that constitutes a generic summary of the information concerning the entity q with given limit on size of S. This article concerns the role of diversity in this quite novel problem. It gives an overview of the diversity concept in information retrieval, and proposes how to adapt it to GES. A measure of diversity for GES, called ALC, is defined and two algorithms presented, baseline, diversity-oblivious PRECIS and diversity-aware DIVERSUM. A reported experiment shows that DIVERSUM actually achieves higher values of the ALC diversity measure than PRECIS. Next, an objective evaluation experiment demonstrates that diversity-aware algorithm is superior to the diversity-oblivious one in terms of fact selection. More precisely, DIVERSUM clearly achieves higher recall than PRECIS on ground truth reference entity summaries extracted from Wikipedia. We also report another intrinsic experiment, in which the output of diversity-aware algorithm is significantly preferred by human expert evaluators. Importantly, the user feedback clearly indicates that the notion of diversity is the key reason for the preference. In addition, the experiment is repeated twice on an anonymous sample of broad population of Internet users by means of a crowd-sourcing platform, that further confirms the results mentioned above.

[1]  Harald Sack,et al.  Evaluating Entity Summarization Using a Game-Based Ground Truth , 2012, International Semantic Web Conference.

[2]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[3]  Xiaojun Wan,et al.  EUSUM: extracting easy-to-understand english summaries for non-native readers , 2010, SIGIR.

[4]  Marcin Sydow Towards the Foundations of Diversity-Aware Node Summarisation on Knowledge Graphs , 2011 .

[5]  Jeffrey Heer,et al.  GraphPrism: compact visualization of network structure , 2012, AVI.

[6]  Brian A Vander Schee Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business , 2009 .

[7]  Harald Sack,et al.  Towards exploratory video search using linked data , 2009, 2009 11th IEEE International Symposium on Multimedia.

[8]  Dragomir R. Radev,et al.  Coherent Citation-Based Summarization of Scientific Papers , 2011, ACL.

[9]  Xiaojun Wan,et al.  Cross-Language Document Summarization Based on Machine Translation Quality Prediction , 2010, ACL.

[10]  Yuzhong Qu,et al.  Generating summaries for ontology search , 2011, WWW.

[11]  Xiaojun Wan,et al.  Summarizing the differences in multilingual news , 2011, SIGIR.

[12]  William Goffman,et al.  A searching procedure for information retrieval , 1964, Inf. Storage Retr..

[13]  Martin Wattenberg,et al.  Visual exploration of multivariate graphs , 2006, CHI.

[14]  R. Scheaffer,et al.  Mathematical Statistics with Applications. , 1992 .

[15]  Yuzhong Qu,et al.  RELIN: Relatedness and Informativeness-Based Centrality for Entity Summarization , 2011, International Semantic Web Conference.

[16]  Yinglin Wang,et al.  Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining , 2010, ACL.

[17]  K. Ramachandran,et al.  Mathematical Statistics with Applications. , 1992 .

[18]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.

[19]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[20]  Haofen Wang,et al.  Snippet Generation for Semantic Web Search Engines , 2008, ASWC.

[21]  Marcin Sydow,et al.  To Diversify or Not to Diversify Entity Summaries on RDF Knowledge Graphs? , 2011, ISMIS.

[22]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[23]  Masood Masoodian,et al.  Readability of a background map layer under a semi-transparent foreground layer , 2014, AVI.

[24]  Gerhard Weikum,et al.  Language-model-based ranking for queries on RDF-graphs , 2009, CIKM.

[25]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[26]  Marcelo Mendoza,et al.  Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011 , 2011 .

[27]  Maya Ramanath,et al.  Generating Concise and Readable Summaries of XML Documents , 2009, ArXiv.

[28]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[29]  Xiaojun Wan,et al.  Topic analysis for topic-focused multi-document summarization , 2009, CIKM.

[30]  Jignesh M. Patel,et al.  Discovery-driven graph summarization , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[31]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[32]  Daren C. Brabham Crowdsourcing as a Model for Problem Solving , 2008 .

[33]  Gurpreet Singh Lehal,et al.  A Survey of Text Summarization Extractive Techniques , 2010 .

[34]  Sriram Subramanian,et al.  Talking about tactile experiences , 2013, CHI.

[35]  Marcin Sydow,et al.  AGNES: A Novel Algorithm for Visualising Diversified Graphical Entity Summarisations on Knowledge Graphs , 2012, ISMIS.

[36]  Giuseppe Carenini,et al.  Methods for Mining and Summarizing Text Conversations , 2011, Synthesis Lectures on Data Management.

[37]  Ioan Toma,et al.  Leveraging Usage Data for Linked Data Movie Entity Summarization , 2012, ArXiv.

[38]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[39]  Jirí Dokulil,et al.  Visual Exploration of RDF Data , 2008, SOFSEM.

[40]  Ed H. Chi,et al.  Crowdsourcing for Usability: Using Micro-Task Markets for Rapid, Remote, and Low-Cost User Measurements , 2007 .

[41]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[42]  Jeff Heflin,et al.  The Semantic Web – ISWC 2012 , 2012, Lecture Notes in Computer Science.

[43]  Sihem Amer-Yahia,et al.  Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[44]  Maya Ramanath,et al.  Xoom: a tool for zooming in and out of XML documents , 2009, EDBT '09.

[45]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[46]  Marcin Sydow,et al.  DIVERSUM: Towards diversified summarisation of entities in knowledge graphs , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[47]  Xiang Zhang,et al.  Ontology summarization based on rdf sentence graph , 2007, WWW '07.

[48]  S. Robertson The probability ranking principle in IR , 1997 .

[49]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[50]  Hans-Jörg Schulz,et al.  Honeycomb: Visual Analysis of Large Scale Social Networks , 2009, INTERACT.

[51]  Marcin Sydow,et al.  Entity summarisation with limited edge budget on knowledge graphs , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.

[52]  Man Lung Yiu,et al.  Group-by skyline query processing in relational engines , 2009, CIKM.

[53]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[54]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[55]  Omar Alonso,et al.  Crowdsourcing for relevance evaluation , 2008, SIGF.

[56]  Xiaojun Wan,et al.  Exploiting neighborhood knowledge for single document summarization and keyphrase extraction , 2010, TOIS.