RDF Graph Summarization Based on Approximate Patterns

The Linked Open Data (LOD) cloud brings together information described in RDF and stored on the web in (possibly distributed) RDF Knowledge Bases (KBs). The data in these KBs are not necessarily described by a known schema and many times it is extremely time consuming to query all the interlinked KBs in order to acquire the necessary information. But even when the KB schema is known, we need actually to know which parts of the schema are used. We solve this problem by summarizing large RDF KBs using top-K approximate RDF graph patterns, which we transform to an RDF schema that describes the contents of the KB. This schema describes accurately the KB, even more accurately than an existing schema because it describes the actually used schema, which corresponds to the existing data. We add information on the number of various instances of the patterns, thus allowing the query to estimate the expected results. That way we can then query the RDF graph summary to identify whether the necessary information is present and if it is present in significant numbers whether to be included in a federated query result.

[1]  Sriram Raghavan,et al.  Representing Web graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[2]  Ying Zhang,et al.  ASSG: Adaptive structural summary for RDF graph data , 2014, SEMWEB.

[3]  Kongfa Hu,et al.  A Graph Summarization Algorithm Based on RFID Logistics , 2012 .

[4]  Yves Lechevallier,et al.  Graph Aggregation : Application to Social Networks , 2011, HDSDA.

[5]  Salvatore Orlando,et al.  Mining Top-K Patterns from Binary Datasets in Presence of Noise , 2010, SDM.

[6]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Mariano P. Consens,et al.  Exploring RDF Usage and Interlinking in the Linked Open Data Cloud using ExpLOD , 2010, LDOW.

[8]  Georg Lausen,et al.  Large-scale bisimulation of RDF graphs , 2013, SWIM '13.

[9]  Steffen Staab,et al.  SchemEX - Efficient construction of a data catalogue by stream-based indexing of linked data , 2012, J. Web Semant..

[10]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[11]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[12]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[13]  Salvatore Orlando,et al.  Supervised Evaluation of Top-k Itemset Mining Algorithms , 2015, DaWaK.

[14]  Jignesh M. Patel,et al.  Discovery-driven graph summarization , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[15]  Salvatore Orlando,et al.  A Unifying Framework for Mining Approximate Top- $k$ Binary Patterns , 2014, IEEE Transactions on Knowledge and Data Engineering.

[16]  François Goasdoué,et al.  Query-Oriented Summarization of RDF Graphs , 2015, BICOD.

[17]  Jignesh M. Patel,et al.  Interactive Graph Summarization , 2010, Link Mining.

[18]  Yang Xiang,et al.  Summarizing transactional databases with overlapped hyperrectangles , 2011, Data Mining and Knowledge Discovery.

[19]  A. Scherp,et al.  SchemEX — Web-Scale Indexed Schema Extraction of Linked Open Data ( BTC Submission ) , 2011 .

[20]  Fang Zhou,et al.  Methods for Network Abstraction , 2012 .

[21]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[22]  Fang Zhou,et al.  Compression of weighted graphs , 2011, KDD.

[23]  Anas Alzogbi,et al.  Similar Structures inside RDF-Graphs , 2013, LDOW.

[24]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2006, IEEE Transactions on Knowledge and Data Engineering.

[25]  Giovanni Tummarello,et al.  Introducing RDF Graph Summary with Application to Assisted SPARQL Formulation , 2012, 2012 23rd International Workshop on Database and Expert Systems Applications.

[26]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[27]  Mariano P. Consens,et al.  ExpLOD: Summary-Based Exploration of Interlinking and RDF Usage in the Linked Open Data Cloud , 2010, ESWC.

[28]  Pauli Miettinen,et al.  Model order selection for boolean matrix factorization , 2011, KDD.

[29]  Mariano P. Consens,et al.  Understanding Billions of Triples with Usage Summaries , 2011 .