Deriving an Emergent Relational Schema from RDF Data

We motivate and describe techniques that allow to detect an "emergent" relational schema from RDF data. We show that on a wide variety of datasets, the found structure explains well over 90% of the RDF triples. Further, we also describe technical solutions to the semantic challenge to give short names that humans find logical to these emergent tables, columns and relationships between tables. Our techniques can be exploited in many ways, e.g., to improve the efficiency of SPARQL systems, or to use existing SQL-based applications on top of any RDF dataset using a RDBMS.

[1]  Thomas Neumann,et al.  Exploiting the query structure for efficient join ordering in SPARQL queries , 2014, EDBT.

[2]  Karlis Freivalds,et al.  Fast and Simple Approximation of the Diameter and Radius of a Graph , 2006, WEA.

[3]  Mohamed F. Mokbel,et al.  RDF Data-Centric Storage , 2009, 2009 IEEE International Conference on Web Services.

[4]  James P. Callan,et al.  Automatically labeling hierarchical clusters , 2006, DG.O.

[5]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[6]  Guido Moerkotte,et al.  Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[7]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[8]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[9]  Xiaoyong Du,et al.  FlexTable: Using a Dynamic Relation Model to Store RDF Data , 2010, DASFAA.

[10]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[12]  Marcelo Arenas,et al.  A Principled Approach to Bridging the Gap between Graph Data and their Schemas , 2014, Proc. VLDB Endow..

[13]  Giovanni Tummarello,et al.  Introducing RDF Graph Summary with Application to Assisted SPARQL Formulation , 2012, 2012 23rd International Workshop on Database and Expert Systems Applications.

[14]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.

[15]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[16]  Julian Dolby,et al.  Building an efficient RDF store over a relational database , 2013, SIGMOD '13.

[17]  P. Boncz,et al.  Recognizing, Naming and Exploring Structure in RDF Data , 2014 .

[18]  Kevin Wilkinson,et al.  Jena Property Table Implementation , 2006 .

[19]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[20]  Yuefeng Li,et al.  Mining ontology for automatically acquiring Web user information needs , 2006, IEEE Transactions on Knowledge and Data Engineering.

[21]  Orri Erling,et al.  Virtuoso, a Hybrid RDBMS/Graph Column Store , 2012, IEEE Data Eng. Bull..

[22]  Krisztian Balog,et al.  When Simple is (more than) Good Enough: Effective Semantic Search with (almost) no Semantics , 2012, ECIR.

[23]  Eugene Inseok Chong,et al.  An Efficient SQL-based RDF Querying Scheme , 2005, VLDB.