A survey of RDF management technologies and benchmark datasets

AbstractWith the fast development of semantic web and some other areas, the amount of resource description framework (RDF) data has increased significantly. How to efficiently manage these masses of RDF data has become a challenging task, and has attracted many scholars to research. This paper introduces the state-of-the-art of the RDF storage and query technologies according to some classification criteria. In addition, several prevailing benchmark datasets are introduced and compared. Finally, research challenges and opportunities in future are discussed.

[1]  Brian McBride,et al.  Jena: A Semantic Web Toolkit , 2002, IEEE Internet Comput..

[2]  Daniel J. Abadi,et al.  Query optimization of distributed pattern matching , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[3]  Heiner Stuckenschmidt,et al.  RDF Storage and Retrieval Systems , 2009, Handbook on Ontologies.

[4]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[5]  J. Carroll,et al.  Jena: implementing the semantic web recommendations , 2004, WWW Alt. '04.

[6]  Tao Liu,et al.  RStar: an RDF storage and query system for enterprise resource management , 2004, CIKM '04.

[7]  Li Ma,et al.  Efficient Indices Using Graph Partitioning in RDF Triple Stores , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[8]  Octavian Udrea,et al.  Apples and oranges: a comparison of RDF benchmarks and real RDF datasets , 2011, SIGMOD '11.

[9]  Ioannis Konstantinou,et al.  H2RDF+: an efficient data management system for big RDF graphs , 2014, SIGMOD Conference.

[10]  R. Doyle The American terrorist. , 2001, Scientific American.

[11]  Jim Webber,et al.  A programmatic introduction to Neo4j , 2018, SPLASH '12.

[12]  Xiaofeng Meng,et al.  HStar - A Semantic Repository for Large Scale OWL Documents , 2006, ASWC.

[13]  Li Ma,et al.  Towards a Complete OWL Ontology Benchmark , 2006, ESWC.

[14]  Emmanuel S. Pilli,et al.  Research issues in RDF management systems , 2016, 2016 International Conference on Emerging Trends in Communication Technologies (ETCT).

[15]  Jeffrey F. Naughton,et al.  Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16]  Orri Erling,et al.  RDF Support in the Virtuoso DBMS , 2007, CSSW.

[17]  Atanas Kiryakov,et al.  OWLIM - A Pragmatic Semantic Repository for OWL , 2005, WISE Workshops.

[18]  Richard E. Schantz,et al.  High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store , 2010, PSI EtA '10.

[19]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..

[20]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[21]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[22]  Lei Zou,et al.  gStore: a graph-based SPARQL query engine , 2014, The VLDB Journal.

[23]  François Goasdoué,et al.  CliqueSquare: Flat plans for massively parallel RDF queries , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[24]  Daniel J. Abadi,et al.  Column Stores for Wide and Sparse Data , 2007, CIDR.

[25]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[26]  Guan Le,et al.  Survey on NoSQL database , 2011, 2011 6th International Conference on Pervasive Computing and Applications.

[27]  Swaminathan Sivasubramanian,et al.  Amazon dynamoDB: a seamlessly scalable non-relational database service , 2012, SIGMOD Conference.

[28]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[29]  Zongmin Ma,et al.  A Review of RDF Storage in NoSQL Databases , 2016 .

[30]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[31]  Martin Theobald,et al.  TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing , 2014, SIGMOD Conference.

[32]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[33]  Andreas Harth,et al.  Optimized index structures for querying RDF from the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[34]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[35]  Tim Hawkins,et al.  The Definitive Guide to MongoDB , 2015, Apress.

[36]  Nicholas Gibbins,et al.  3store: Efficient Bulk RDF Storage , 2003, PSSS.

[37]  Haixun Wang,et al.  A Distributed Graph Engine for Web Scale RDF Data , 2013, Proc. VLDB Endow..

[38]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[39]  Lei Zou,et al.  Graph-Based RDF Data Management , 2017, Data Science and Engineering.

[40]  Jeff Heflin,et al.  DLDB: Extending Relational Databases to Support Semantic Web Queries , 2003, PSSS.

[41]  Katja Hose,et al.  Partout: a distributed engine for efficient RDF processing , 2012, WWW.

[42]  Tim Hawkins,et al.  The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing , 2010 .

[43]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[44]  Daniel J. Abadi,et al.  SW-Store: a vertically partitioned DBMS for Semantic Web data management , 2009, The VLDB Journal.

[45]  Sherif Sakr,et al.  DREAM: Distributed RDF Engine with Adaptive Query Planner and Minimal Communication , 2015, Proc. VLDB Endow..

[46]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[47]  Bhavani M. Thuraisingham,et al.  Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing , 2011, IEEE Transactions on Knowledge and Data Engineering.