RDF graph summarization: principles, techniques and applications

The explosion in the amount of the RDF on the Web has lead to the need to explore, query and understand such data sources. The task is challenging due to the complex and heterogeneous structure of RDF graphs which, unlike relational databases, do not come with a structure-dictating schema. Summarization has been applied to RDF data to facilitate these tasks. Its purpose is to extract concise andmeaningful information from RDF knowledge bases, representing their content as faithfully as possible. There is no single concept of RDF summary, and not a single but many approaches to build such summaries; the summarization goal, and the main computational tools employed for summarizing graphs, are the main factors behind this diversity. This tutorial presents a structured analysis and comparison existing works in the area of RDF summarization; it is based upon a recent survey which we co-authored with colleagues [3]. We present the concepts at the core of each approach, outline their main technical aspects and implementation. We conclude by identifying the most pertinent summarization method for different usage scenarios, and discussing areas where future effort

[1]  Steffen Staab,et al.  SchemEX - Efficient construction of a data catalogue by stream-based indexing of linked data , 2012, J. Web Semant..

[2]  François Goasdoué,et al.  Compact Summaries of Rich Heterogeneous Graphs , 2018 .

[3]  Ana Carolina Salgado,et al.  A Method for Building Personalized Ontology Summaries , 2013, J. Inf. Data Manag..

[4]  Claudio Lucchese,et al.  Summarizing Linked Data RDF Graphs Using Approximate Graph Pattern Mining , 2016, EDBT.

[5]  Pascal Hitzler,et al.  Logical Linked Data Compression , 2013, ESWC.

[6]  Martin Doerr,et al.  X3ML mapping framework for information integration in cultural heritage and beyond , 2017, International Journal on Digital Libraries.

[7]  Claudio Lucchese,et al.  RDF Graph Summarization Based on Approximate Patterns , 2015, ISIP.

[8]  Edith Schonberg,et al.  Scalable Semantic Retrieval through Summarization and Refinement , 2007, AAAI.

[9]  Sourav S. Bhowmick,et al.  Summarizing Static and Dynamic Big Graphs , 2017, Proc. VLDB Endow..

[10]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[11]  François Goasdoué,et al.  Summarizing semantic graphs: a survey , 2018, The VLDB Journal.

[12]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[13]  Danai Koutra,et al.  Graph Summarization Methods and Applications: A Survey , 2016 .

[14]  François Goasdoué,et al.  Incremental structural summarization of RDF graphs , 2019, EDBT.

[15]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[16]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[17]  Dimitris Kotzinos,et al.  Quality metrics for RDF graph summarization , 2019, Semantic Web.

[18]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[19]  Gang Wu,et al.  Identifying Potentially Important Concepts and Relations in an Ontology , 2008, International Semantic Web Conference.

[20]  Edith Schonberg,et al.  Scalable highly expressive reasoner (SHER) , 2009, J. Web Semant..

[21]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[22]  Qi He,et al.  Distributed Graph Summarization , 2014, CIKM.

[23]  Jan Hidders,et al.  A Structural Approach to Indexing Triples , 2012, ESWC.

[24]  Yevgeny Kazakov,et al.  Abstraction Refinement for Ontology Materialization , 2014, Description Logics.

[25]  Andrea Maurino,et al.  ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization , 2016, SumPre@ESWC.

[26]  Kostas Stefanidis,et al.  Exploring RDFS KBs Using Summaries , 2018, International Semantic Web Conference.

[27]  Vojtech Svátek,et al.  Dataset Summary Visualization with LODSight , 2015, ESWC.

[28]  François Goasdoué,et al.  Browsing Linked Data Catalogs with LODAtlas , 2018, International Semantic Web Conference.

[29]  Dimitris Plexousakis,et al.  Ontology Evolution: Assisting Query Migration , 2012, ER.

[30]  Katja Hose,et al.  Towards benefit-based RDF source selection for SPARQL queries , 2012, SWIM '12.

[31]  Mariano P. Consens,et al.  ExpLOD: Summary-Based Exploration of Interlinking and RDF Usage in the Linked Open Data Cloud , 2010, ESWC.

[32]  Dimitris Plexousakis,et al.  Exploring Importance Measures for Summarizing RDF/S KBs , 2017, ESWC.

[33]  Lora Aroyo,et al.  Extracting Core Knowledge from Linked Data , 2011, COLD.

[34]  Jeffrey D. Ullman,et al.  Representative objects: concise representations of semistructured, hierarchical data , 1997, Proceedings 13th International Conference on Data Engineering.

[35]  Dimitris Plexousakis,et al.  Ontology Evolution in Data Integration: Query Rewriting to the Rescue , 2011, ER.

[36]  Edith Schonberg,et al.  The Summary Abox: Cutting Ontologies Down to Size , 2006, SEMWEB.

[37]  François Goasdoué,et al.  A Framework for Efficient Representative Summarization of RDF Graphs , 2017, SEMWEB.

[38]  Mariano P. Consens,et al.  S+EPPs: Construct and Explore Bisimulation Summaries, plus Optimize Navigational Queries; all on Existing SPARQL Systems , 2015, Proc. VLDB Endow..

[39]  Dimitris Plexousakis,et al.  RDF Digest: Efficient Summarization of RDF/S KBs , 2015, ESWC.

[40]  Dimitris Plexousakis,et al.  Ontology evolution without tears , 2013, J. Web Semant..

[41]  Yinghui Wu,et al.  Mining Summaries for Knowledge Graph Search , 2018, IEEE Transactions on Knowledge and Data Engineering.

[42]  Robert E. Tarjan,et al.  Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[43]  Lei Zou,et al.  Semantic SPARQL Similarity Search Over RDF Knowledge Graphs , 2016, Proc. VLDB Endow..

[44]  Marcin Sydow,et al.  The notion of diversity in graphical entity summarisation on semantic knowledge graphs , 2013, Journal of Intelligent Information Systems.

[45]  Jan Hidders,et al.  External memory K-bisimulation reduction of big graphs , 2012, CIKM.

[46]  Anas Alzogbi,et al.  Similar Structures inside RDF-Graphs , 2013, LDOW.

[47]  V. S. Subrahmanian,et al.  GRIN: A Graph Based RDF Index , 2007, AAAI.

[48]  Xiang Zhang,et al.  Ontology summarization based on rdf sentence graph , 2007, WWW '07.

[49]  Paolo Tomeo,et al.  Generating examples of paths summarizing RDF datasets , 2016, SEMANTiCS.

[50]  Dimitris Plexousakis,et al.  Ontology understanding without tears: The summarization approach , 2017, Semantic Web.

[51]  François Goasdoué,et al.  Query-Oriented Summarization of RDF Graphs , 2015, Proc. VLDB Endow..

[52]  Martin Doerr,et al.  X3ML Framework: An Effective Suite for Supporting Data Mappings , 2015, EMF-CRM@TPDL.

[53]  Boris Motik,et al.  Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation , 2018, WWW.

[54]  Li Ma,et al.  SHIN ABox Reduction , 2006, Description Logics.

[55]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[56]  Ana Carolina Salgado,et al.  Summarizing ontology-based schemas in PDMS , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[57]  Abraham Bernstein,et al.  Avalanche: Putting the Spirit of the Web back into Semantic Web Querying , 2010, ISWC Posters&Demos.

[58]  Salvatore Orlando,et al.  A Unifying Framework for Mining Approximate Top- $k$ Binary Patterns , 2014, IEEE Transactions on Knowledge and Data Engineering.

[59]  Xiang Zhang,et al.  Summarizing Vocabularies in the Global Semantic Web , 2009, Journal of Computer Science and Technology.