STLIS: A Scalable Two-Level Index Scheme for Big Data in IoT

The rapid development of the Internet of Things causes the dramatic growth of data, which poses an important challenge on the storage and quick retrieval of big data. As an effective representation model, RDF receives the most attention. More and more storage and index schemes have been developed for RDF model. For the large-scale RDF data, most of them suffer from a large number of self-joins, high storage cost, and many intermediate results. In this paper, we propose a scalable two-level index scheme (STLIS) for RDF data. In the first level, we devise a compressed path template tree (CPTT) index based on S-tree to retrieve the candidate sets of full path. In the second level, we create a hierarchical edge index (HEI) and a node-predicate (NP) index to accelerate the match. Extensive experiments are executed on two representative RDF benchmarks and one real RDF dataset in IoT by comparison with three representative index schemes, that is, RDF-3X, Bitmat, and TripleBit. Results demonstrate that our proposed scheme can respond to the complex query in real time and save much storage space compared with RDF-3X and Bitmat.

[1]  James A. Hendler,et al.  BitMat: A Main-memory Bit Matrix of RDF Triples for Conjunctive Triple Pattern Queries , 2008, SEMWEB.

[2]  Hai Jin,et al.  Scalable SPARQL querying using path partitioning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[3]  J. Carroll,et al.  Jena: implementing the semantic web recommendations , 2004, WWW Alt. '04.

[4]  Said Mirza Pahlevi,et al.  RDFCube: A P2P-Based Three-Dimensional Index for Structural Joins on Distributed Triple Stores , 2005, DBISP2P.

[5]  Jukka Riekki,et al.  Connecting IoT Sensors to Knowledge-based Systems by Transforming SenML to RDF , 2014, ANT/SEIT.

[6]  Hai Jin,et al.  TripleBit: a Fast and Compact System for Large Scale RDF Data , 2013, Proc. VLDB Endow..

[7]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[8]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[9]  Uwe Deppisch,et al.  S-tree: a dynamic balanced signature index for office retrieval , 1986, SIGIR '86.

[10]  Hyoung-Joo Kim,et al.  R3F: RDF triple filtering method for efficient SPARQL query processing , 2013, World Wide Web.

[11]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[12]  Nicholas Gibbins,et al.  3store: Efficient Bulk RDF Storage , 2003, PSSS.

[13]  P. Sreenivasa Kumar,et al.  SPOVC: a scalable RDF store using horizontal partitioning and column oriented DBMS , 2012, SWIM '12.

[14]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[15]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[16]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[17]  Thanh Tran Structure Index for RDF Data , 2010 .

[18]  V. S. Subrahmanian,et al.  GRIN: A Graph Based RDF Index , 2007, AAAI.

[19]  Daniel J. Abadi,et al.  SW-Store: a vertically partitioned DBMS for Semantic Web data management , 2009, The VLDB Journal.

[20]  Dhananjay Singh,et al.  A survey of Internet-of-Things: Future vision, architecture, challenges and services , 2014, 2014 IEEE World Forum on Internet of Things (WF-IoT).

[21]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[22]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..