Scalable Semantic Web Data Management Using Vertical Partitioning

Efficient management of RDF data is an important factor in realizing the Semantic Web vision. Performance and scalability issues are becoming increasingly pressing as Semantic Web technology is applied to real-world applications. In this paper, we examine the reasons why current data management solutions for RDF data scale poorly, and explore the fundamental scalability limitations of these approaches. We review the state of the art for improving performance for RDF databases and consider a recent suggestion, "property tables." We then discuss practically and empirically why this solution has undesirable features. As an improvement, we propose an alternative solution: vertically partitioning the RDF data. We compare the performance of vertical partitioning with prior art on queries generated by a Web-based RDF browser over a large-scale (more than 50 million triples) catalog of library data. Our results show that a vertical partitioned schema achieves similar performance to the property table technique while being much simpler to design. Further, if a column-oriented DBMS (a database architected specially for the vertically partitioned case) is used instead of a row-oriented DBMS, another order of magnitude performance improvement is observed, with query times dropping from minutes to several seconds.

[1]  Annika Hinze,et al.  Storing RDF as a graph , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).

[2]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[3]  Daniela Florescu,et al.  Storing and Querying XML Data using an RDMBS , 1999, IEEE Data Eng. Bull..

[4]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[5]  Vassilis Christophides,et al.  The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , 2001, SemWeb.

[6]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[7]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[8]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[9]  Vassilis Christophides,et al.  Benchmarking Database Representations of RDF/S Stores , 2005, SEMWEB.

[10]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[11]  Don S. Batory,et al.  On searching transposed files , 1978, ACM Trans. Database Syst..

[12]  Nicholas Gibbins,et al.  3store: Efficient Bulk RDF Storage , 2003, PSSS.

[13]  Sneha Godbole Cse 718 -seminar Report Scalable Semantic Web Data Management Using Vertical Partitioning , 2008 .

[14]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[15]  David J. DeWitt,et al.  Materialization Strategies in a Column-Oriented DBMS , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Eugene Inseok Chong,et al.  An Efficient SQL-based RDF Querying Scheme , 2005, VLDB.

[17]  Roger MacNicol,et al.  Sybase IQ Multiplex - Designed For Analytics , 2004, VLDB.

[18]  Martin L. Kersten,et al.  MIL primitives for querying a fragmented world , 1999, The VLDB Journal.

[19]  Daniel J. Abadi,et al.  Using The Barton Libraries Dataset As An RDF benchmark , 2007 .

[20]  Rakesh Agrawal,et al.  Storage and Querying of E-Commerce Data , 2001, VLDB.

[21]  Kevin Wilkinson,et al.  Jena Property Table Implementation , 2006 .

[22]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[23]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[24]  Daniel J. Abadi,et al.  Column Stores for Wide and Sparse Data , 2007, CIDR.

[25]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[26]  Jeffrey F. Naughton,et al.  Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Dave Reynolds,et al.  Efficient RDF Storage and Retrieval in Jena2 , 2003, SWDB.

[28]  Anneli Folkesson,et al.  World Wide Web Consortium (W3C) , 2005 .