Compressed Indexes for Fast Search of Semantic Data

The sheer increase in volume of RDF data demands efficient solutions for the triple indexing problem, that is devising a compressed data structure to compactly represent RDF triples by guaranteeing, at the same time, fast pattern matching operations. This problem lies at the heart of delivering good practical performance for the resolution of complex SPARQL queries on large RDF datasets. In this work, we propose a trie-based index layout to solve the problem and introduce two novel techniques to reduce its space of representation for improved effectiveness. The extensive experimental analysis conducted over a wide range of publicly available real-world datasets, reveals that our best space/time trade-off configuration substantially outperforms existing solutions at the state-of-the-art, by taking 30-60% less space and speeding up query execution by a factor of 2-81x.

[1]  Miguel A. Martínez-Prieto,et al.  Self-Indexing RDF Archives , 2016, 2016 Data Compression Conference (DCC).

[2]  Gonzalo Navarro,et al.  k2-Trees for Compact Web Graph Representation , 2009, SPIRE.

[3]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[4]  Rossano Venturini,et al.  Inverted Index Compression , 2019, Encyclopedia of Big Data Technologies.

[5]  Heiko Paulheim,et al.  Type Inference on Noisy RDF Data , 2013, SEMWEB.

[6]  Rossano Venturini,et al.  Techniques for Inverted Index Compression , 2019, ACM Comput. Surv..

[7]  Daniel Lemire,et al.  Vectorized VByte Decoding , 2015, ArXiv.

[8]  Julien Subercaze,et al.  Inferray: fast in-memory RDF inference , 2016, Proc. VLDB Endow..

[9]  Hai Jin,et al.  TripleBit: a Fast and Compact System for Large Scale RDF Data , 2013, Proc. VLDB Endow..

[10]  Giuseppe Ottaviano,et al.  Partitioned Elias-Fano indexes , 2014, SIGIR.

[11]  Rossano Venturini,et al.  Efficient Data Structures for Massive N-Gram Datasets , 2017, SIGIR.

[12]  Daniel J. Abadi,et al.  SW-Store: a vertically partitioned DBMS for Semantic Web data management , 2009, The VLDB Journal.

[13]  Olivier Curé,et al.  WaterFowl: A Compact, Self-indexed and Inference-Enabled Immutable RDF Store , 2014, ESWC.

[14]  Gerhard Weikum,et al.  x-RDF-3X , 2010, Proc. VLDB Endow..

[15]  Gonzalo Navarro,et al.  Word-based self-indexes for natural language text , 2012, TOIS.

[16]  Larry H. Thiel,et al.  Program design for retrospective searches on large data bases , 1972, Inf. Storage Retr..

[17]  Rossano Venturini,et al.  Handling Massive N-Gram Datasets Efficiently , 2018, ACM Trans. Inf. Syst..

[18]  Nieves R. Brisaboa,et al.  Compressed vertical partitioning for efficient RDF management , 2014, Knowledge and Information Systems.

[19]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[20]  Sherif Sakr,et al.  RDF Data Storage and Query Processing Schemes , 2018, ACM Comput. Surv..

[21]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[22]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[23]  Sharmi Sankar,et al.  An Efficient and Scalable RDF Indexing Strategy based on B-Hashed-Bitmap Algorithm using CUDA , 2014 .

[24]  M. Tamer Özsu A survey of RDF data management systems , 2016, Frontiers of Computer Science.

[25]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[26]  Miguel A. Martínez-Prieto,et al.  Exchange and Consumption of Huge RDF Data , 2012, ESWC.

[27]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[28]  Peter Elias,et al.  Efficient Storage and Retrieval by Content and Address of Static Files , 1974, JACM.

[29]  M. Tamer Özsu,et al.  Diversified Stress Testing of RDF Data Management Systems , 2014, SEMWEB.

[30]  Kunihiko Sadakane,et al.  New text indexing functionalities of the compressed suffix arrays , 2003, J. Algorithms.

[31]  Sherif Sakr,et al.  Encyclopedia of Big Data Technologies , 2019, Springer International Publishing.

[32]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[33]  Nieves R. Brisaboa,et al.  A Compact RDF Store Using Suffix Arrays , 2015, SPIRE.

[34]  Raffaele Perego,et al.  Compressed Indexes for Fast Search of Semantic Data (Extended Abstract) , 2021, 2021 IEEE 37th International Conference on Data Engineering (ICDE).

[35]  Miguel A. Martínez-Prieto,et al.  Compact Representation of Large RDF Data Sets for Publishing and Exchange , 2010, SEMWEB.