Summarizing semantic graphs: a survey

The explosion in the amount of the available RDF data has lead to the need to explore, query and understand such data sources. Due to the complex structure of RDF graphs and their heterogeneity, the exploration and understanding tasks are significantly harder than in relational databases, where the schema can serve as a first step toward understanding the structure. Summarization has been applied to RDF data to facilitate these tasks. Its purpose is to extract concise and meaningful information from RDF knowledge bases, representing their content as faithfully as possible. There is no single concept of RDF summary, and not a single but many approaches to build such summaries; each is better suited for some uses, and each presents specific challenges with respect to its construction. This survey is the first to provide a comprehensive survey of summarization method for semantic RDF graphs. We propose a taxonomy of existing works in this area, including also some closely related works developed prior to the adoption of RDF in the data management community; we present the concepts at the core of each approach and outline their main technical aspects and implementation. We hope the survey will help readers understand this scientifically rich area and identify the most pertinent summarization method for a variety of usage scenarios.

[1]  Kenza Kellou-Menouer,et al.  Schema Discovery in RDF Data Sources , 2015, ER.

[2]  Mariano P. Consens,et al.  Constructing Bisimulation Summaries on a Multi-Core Graph Processing Framework , 2015, GRADES@SIGMOD/PODS.

[3]  Enrico Motta,et al.  KC-Viz: A Novel Approach to Visualizing and Navigating Ontologies , 2010, EKAW.

[4]  Jan Hidders,et al.  A Structural Approach to Indexing Triples , 2012, ESWC.

[5]  François Goasdoué,et al.  Query-Oriented Summarization of RDF Graphs , 2015, BICOD.

[6]  Dimitris Plexousakis,et al.  RDF Digest: Ontology Exploration using Summaries , 2015, International Semantic Web Conference.

[7]  Claudio Lucchese,et al.  Summarizing Linked Data RDF Graphs Using Approximate Graph Pattern Mining , 2016, EDBT.

[8]  Pascal Hitzler,et al.  Logical Linked Data Compression , 2013, ESWC.

[9]  Xin Wang,et al.  Query preserving graph compression , 2012, SIGMOD Conference.

[10]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[11]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[12]  Sourav S. Bhowmick,et al.  FUSE: a profit maximization approach for functional summarization of biological networks , 2012, BMC Bioinformatics.

[13]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[14]  Dimitris Kotzinos,et al.  Quality metrics for RDF graph summarization , 2019, Semantic Web.

[15]  Mariano P. Consens,et al.  S+EPPs: Construct and Explore Bisimulation Summaries, plus Optimize Navigational Queries; all on Existing SPARQL Systems , 2015, Proc. VLDB Endow..

[16]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[17]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[18]  Gang Wu,et al.  Identifying Potentially Important Concepts and Relations in an Ontology , 2008, International Semantic Web Conference.

[19]  Ana Carolina Salgado,et al.  Summarizing ontology-based schemas in PDMS , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[20]  Sebastian Rudolph,et al.  Managing Structured and Semistructured RDF Data Using Structure Indexes , 2013, IEEE Transactions on Knowledge and Data Engineering.

[21]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[22]  Mi-Yen Yeh,et al.  Influential Nodes in a One-Wave Diffusion Model for Location-Based Social Networks , 2013, PAKDD.

[23]  Giovanni Tummarello,et al.  Introducing RDF Graph Summary with Application to Assisted SPARQL Formulation , 2012, 2012 23rd International Workshop on Database and Expert Systems Applications.

[24]  Steffen Staab,et al.  SchemEX - Efficient construction of a data catalogue by stream-based indexing of linked data , 2012, J. Web Semant..

[25]  Sebastiano Vigna,et al.  Axioms for Centrality , 2013, Internet Math..

[26]  Jeff Z. Pan,et al.  Graph Pattern Based RDF Data Compression , 2014, JIST.

[27]  François Goasdoué,et al.  View Selection in Semantic Web Databases , 2011, Proc. VLDB Endow..

[28]  Hamid R. Arabnia,et al.  A Comprehensive Survey of Ontology Summarization: Measures and Methods , 2018, ArXiv.

[29]  Andrea Maurino,et al.  ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization , 2016, SumPre@ESWC.

[30]  Kostas Stefanidis,et al.  Exploring RDFS KBs Using Summaries , 2018, International Semantic Web Conference.

[31]  Jignesh M. Patel,et al.  Discovery-driven graph summarization , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[32]  Vojtech Svátek,et al.  Dataset Summary Visualization with LODSight , 2015, ESWC.

[33]  Robert E. Tarjan,et al.  Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[34]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[35]  Martin Doerr,et al.  X3ML Framework: An Effective Suite for Supporting Data Mappings , 2015, EMF-CRM@TPDL.

[36]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[37]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[38]  Edith Schonberg,et al.  Scalable highly expressive reasoner (SHER) , 2009, J. Web Semant..

[39]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[40]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[41]  Dimitris Plexousakis,et al.  Exploring Importance Measures for Summarizing RDF/S KBs , 2017, ESWC.

[42]  Lora Aroyo,et al.  Extracting Core Knowledge from Linked Data , 2011, COLD.

[43]  Lei Zou,et al.  Semantic SPARQL Similarity Search Over RDF Knowledge Graphs , 2016, Proc. VLDB Endow..

[44]  Mariano P. Consens,et al.  Understanding Billions of Triples with Usage Summaries , 2011 .

[45]  Mariano P. Consens,et al.  ExpLOD: Summary-Based Exploration of Interlinking and RDF Usage in the Linked Open Data Cloud , 2010, ESWC.

[46]  Xiang Zhang,et al.  Graph Compression Strategies for Instance-Focused Semantic Mining , 2013, CSWS.

[47]  Danai Koutra,et al.  Graph Summarization Methods and Applications , 2016, ACM Comput. Surv..

[48]  Jignesh M. Patel,et al.  Interactive Graph Summarization , 2010, Link Mining.

[49]  Sourav S. Bhowmick,et al.  Summarizing Static and Dynamic Big Graphs , 2017, Proc. VLDB Endow..

[50]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[51]  Edith Schonberg,et al.  The Summary Abox: Cutting Ontologies Down to Size , 2006, SEMWEB.

[52]  Philip S. Yu,et al.  Graph OLAP: Towards Online Analytical Processing on Graphs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[53]  Dimitris Plexousakis,et al.  Ontology Evolution in Data Integration: Query Rewriting to the Rescue , 2011, ER.

[54]  Anas Alzogbi,et al.  Similar Structures inside RDF-Graphs , 2013, LDOW.

[55]  V. S. Subrahmanian,et al.  GRIN: A Graph Based RDF Index , 2007, AAAI.

[56]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[57]  Xiang Zhang,et al.  Ontology summarization based on rdf sentence graph , 2007, WWW '07.

[58]  François Goasdoué,et al.  Compact Summaries of Rich Heterogeneous Graphs , 2018 .

[59]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[60]  Qi He,et al.  Distributed Graph Summarization , 2014, CIKM.

[61]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[62]  Peter F. Patel-Schneider,et al.  OWL 2 Web Ontology Language , 2009 .

[63]  Edith Schonberg,et al.  Scalable Semantic Retrieval through Summarization and Refinement , 2007, AAAI.

[64]  Renée J. Miller,et al.  Exploring XML web collections with DescribeX , 2010, TWEB.

[65]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[66]  Dimitris Plexousakis,et al.  Ontology understanding without tears: The summarization approach , 2017, Semantic Web.

[67]  Young-Koo Lee,et al.  Set-based approximate approach for lossless graph summarization , 2015, Computing.

[68]  Martin Theobald,et al.  Using Graph Summarization for Join-Ahead Pruning in a Distributed RDF Engine , 2014, SWIM.

[69]  Mariano P. Consens,et al.  Exploring RDF Usage and Interlinking in the Linked Open Data Cloud using ExpLOD , 2010, LDOW.

[70]  Dimitris Plexousakis,et al.  Ontology evolution without tears , 2013, J. Web Semant..

[71]  Evimaria Terzi,et al.  GraSS: Graph Structure Summarization , 2010, SDM.

[72]  Paolo Tomeo,et al.  Generating examples of paths summarizing RDF datasets , 2016, SEMANTiCS.

[73]  Georg Lausen,et al.  Large-scale bisimulation of RDF graphs , 2013, SWIM '13.

[74]  Simona Orzan,et al.  A distributed algorithm for strong bisimulation reduction of state spaces , 2002, PDMC@CONCUR.

[75]  Enrico Motta,et al.  Identifying Key Concepts in an Ontology, through the Integration of Cognitive Principles with Statistical and Topological Measures , 2008, ASWC.

[76]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[77]  Dimitris Plexousakis,et al.  Ontology Evolution: Assisting Query Migration , 2012, ER.

[78]  Ana Carolina Salgado,et al.  A Method for Building Personalized Ontology Summaries , 2013, J. Inf. Data Manag..

[79]  Jiawei Han,et al.  Mining Graph Patterns Efficiently via Randomized Summaries , 2009, Proc. VLDB Endow..

[80]  Diego Calvanese,et al.  The NPD Benchmark: Reality Check for OBDA Systems , 2015, EDBT.

[81]  Pascal Hitzler,et al.  Towards Logical Linked Data Compression , 2012 .

[82]  Andrea Maurino,et al.  ABSTAT: Linked Data Summaries with ABstraction and STATistics , 2015, ESWC.

[83]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.

[84]  Yevgeny Kazakov,et al.  Abstraction Refinement for Ontology Materialization , 2014, Description Logics.

[85]  Bijan Parsia,et al.  Proceedings of the 2006 International Workshop on Description Logics (DL2006), Windermere, Lake District, UK, May 30 - June 1, 2006 , 2006, Description Logics.

[86]  Yves Lechevallier,et al.  Graph Aggregation : Application to Social Networks , 2011, HDSDA.

[87]  Marcin Sydow,et al.  The notion of diversity in graphical entity summarisation on semantic knowledge graphs , 2013, Journal of Intelligent Information Systems.

[88]  Jianhua Hu,et al.  Towards Graph Summary and Aggregation: A Survey , 2013 .

[89]  Jan Hidders,et al.  External memory K-bisimulation reduction of big graphs , 2012, CIKM.

[90]  François Goasdoué,et al.  Efficient query answering against dynamic RDF databases , 2013, EDBT '13.

[91]  Francesco Bonchi,et al.  Graph summarization with quality guarantees , 2014, 2014 IEEE International Conference on Data Mining.

[92]  Peter A. Boncz,et al.  Deriving an Emergent Relational Schema from RDF Data , 2015, WWW.

[93]  Abraham Bernstein,et al.  Avalanche: Putting the Spirit of the Web back into Semantic Web Querying , 2010, ISWC Posters&Demos.

[94]  Salvatore Orlando,et al.  A Unifying Framework for Mining Approximate Top- $k$ Binary Patterns , 2014, IEEE Transactions on Knowledge and Data Engineering.

[95]  Xiang Zhang,et al.  Summarizing Vocabularies in the Global Semantic Web , 2009, Journal of Computer Science and Technology.

[96]  Dimitris Plexousakis,et al.  RDF Digest: Efficient Summarization of RDF/S KBs , 2015, ESWC.

[97]  Jure Leskovec,et al.  Mining of Massive Datasets, 2nd Ed , 2014 .

[98]  François Goasdoué,et al.  Efficient Query Answering in DL-Lite through FOL Reformulation (Extended Abstract) , 2015, Description Logics.

[99]  Danai Koutra,et al.  Summarizing and understanding large graphs , 2015, Stat. Anal. Data Min..

[100]  Diego Calvanese,et al.  Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family , 2007, Journal of Automated Reasoning.

[101]  Martin Doerr,et al.  X3ML mapping framework for information integration in cultural heritage and beyond , 2017, International Journal on Digital Libraries.

[102]  Kostas Stefanidis,et al.  RDFDigest+: A Summary-driven System for KBs Exploration , 2018, International Semantic Web Conference.

[103]  Claudio Lucchese,et al.  RDF Graph Summarization Based on Approximate Patterns , 2015, ISIP.

[104]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[105]  Feifei Li,et al.  Scalable Keyword Search on Large RDF Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[106]  Boris Motik,et al.  Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation , 2018, WWW.

[107]  Li Ma,et al.  SHIN ABox Reduction , 2006, Description Logics.

[108]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[109]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[110]  Wolfgang Lehner,et al.  SynopSys: large graph analytics in the SAP HANA database through summarization , 2013, GRADES.

[111]  Ying Zhang,et al.  ASSG: Adaptive structural summary for RDF graph data , 2014, SEMWEB.

[112]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[113]  Katja Hose,et al.  Towards benefit-based RDF source selection for SPARQL queries , 2012, SWIM '12.

[114]  Nigel Shadbolt,et al.  Resource Description Framework (RDF) , 2009 .

[115]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[116]  A. Scherp,et al.  SchemEX — Web-Scale Indexed Schema Extraction of Linked Open Data ( BTC Submission ) , 2011 .

[117]  S. Louis Hakimi,et al.  Steiner's problem in graphs and its implications , 1971, Networks.