Constructing Large-Scale Semantic Web Indices for the Six RDF Collation Orders

The Semantic Web community collects masses of valuable and publicly available RDF data in order to drive the success story of the Semantic Web. Efficient processing of these datasets requires their indexing. Semantic Web indices make use of the simple data model of RDF: The basic concept of RDF is the triple, which hence has only 6 different collation orders. On the one hand having 6 collation orders indexed fast merge joins (consuming the sorted input of the indices) can be applied as much as possible during query processing. On the other hand constructing the indices for 6 different collation orders is very time-consuming for large-scale datasets. Hence the focus of this paper is the efficient Semantic Web index construction for large-scale datasets on today's multi-core computers. We complete our discussion with a comprehensive performance evaluation, where our approach efficiently constructs the indices of over 1 billion triples of real world data.

[1]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[2]  Jens Lehmann,et al.  Creating knowledge out of interlinked data , 2010, Semantic Web.

[3]  Arnold L. Rosenberg,et al.  Optimal 2, 3-Trees , 1979, SIAM J. Comput..

[4]  Robert Sedgewick,et al.  Algorithms, 4th Edition , 2011 .

[5]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[6]  John Beidler,et al.  Data Structures and Algorithms , 1996, Wiley Encyclopedia of Computer Science and Engineering.

[7]  Muthu Ramachandran,et al.  Cloud Computing Adoption Framework – a security framework for business clouds , 2015 .

[8]  Justin Zobel,et al.  Using random sampling to build approximate tries for efficient string sorting , 2004, JEAL.

[9]  Mark Fischetti,et al.  Weaving the web - the original design and ultimate destiny of the World Wide Web by its inventor , 1999 .

[10]  Justin Zobel,et al.  Efficient Trie-Based Sorting of Large Sets of Strings , 2003, ACSC.

[11]  J. Vitter,et al.  On Sorting Strings in External Memory , 1997 .

[12]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[13]  Justin Zobel,et al.  Cache-conscious sorting of large sets of strings with dynamic tries , 2004, JEAL.

[14]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[15]  Deepak Garg,et al.  Efficient String Sorting Algorithms: Cache-aware and Cache-Oblivious , 2011 .

[16]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[17]  Navarun Gupta,et al.  Seven V's of Big Data understanding Big Data to extract value , 2014, Proceedings of the 2014 Zone 1 Conference of the American Society for Engineering Education.

[18]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[19]  Sven Groppe,et al.  External sorting for index construction of large semantic web databases , 2010, SAC '10.

[20]  Justin Zobel,et al.  Compression techniques for fast external sorting , 2006, The VLDB Journal.

[21]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[22]  Sven Groppe,et al.  Data Management and Query Processing in Semantic Web Databases , 2011 .

[23]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[24]  Sven Groppe,et al.  PatTrieSort - External String Sorting based on Patricia Tries , 2015, Open J. Databases.