A Hybrid Approach Combining R*-Tree and k-d Trees to Improve Linked Open Data Query Performance

Semantic Web has recently gained traction with the use of Linked Open Data (LOD) on the Web Although numerous state-of-the-art methodologies, standards, and technologies are applicable to the LOD cloud, many issues persist Because the LOD cloud is based on graph-based resource description framework (RDF) triples and the SPARQL query language, we cannot directly adopt traditional techniques employed for database management systems or distributed computing systems This paper addresses how the LOD cloud can be efficiently organized, retrieved, and evaluated We propose a novel hybrid approach that combines the index and live exploration approaches for improved LOD join query performance Using a two-step index structure combining a disk-based 3D R*-tree with the extended multidimensional histogram and flash memory-based k-d trees, we can efficiently discover interlinked data distributed across multiple resources Because this method rapidly prunes numerous false hits, the performance of join query processing is remarkably improved We also propose a hot-cold segment identification algorithm to identify regions of high interest The proposed method is compared with existing popular methods on real RDF datasets Results indicate that our method outperforms the existing methods because it can quickly obtain target results by reducing unnecessary data scanning and reduce the amount of main memory required to load filtering results

[1]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[2]  Marco A. Casanova,et al.  Using Changesets for Incremental Maintenance of Linkset Views , 2016, WISE.

[3]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[4]  Ronald Fagin,et al.  Extendible hashing—a fast access method for dynamic files , 1979, ACM Trans. Database Syst..

[5]  Isao Kojima,et al.  Optimising Coverage, Freshness and Diversity in Live Exploration-based Linked Data Queries , 2016, WIMS.

[6]  Chen Jin,et al.  An improved ID3 decision tree algorithm , 2009, 2009 4th International Conference on Computer Science & Education.

[7]  Irena Holubová,et al.  Linked Data Indexing Methods: A Survey , 2011, OTM Workshops.

[8]  Athanasios Fevgas,et al.  A spatial index for hybrid storage , 2019, IDEAS.

[9]  David Hung-Chang Du,et al.  Hot data identification for flash-based storage systems using multiple bloom filters , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  Jürgen Umbrich,et al.  Data summaries for on-demand queries over linked data , 2010, WWW '10.

[11]  Dimitris Sacharidis,et al.  On enhancing scalability for distributed RDF/S stores , 2011, EDBT/ICDT '11.

[12]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[13]  Kingsley Okoye,et al.  Linked Open Data: State-of-the-Art Mechanisms and Conceptual Framework , 2020, Linked Open Data - Applications, Trends and Future Developments.

[14]  Yannis Tzitzikas,et al.  Scalable Methods for Measuring the Connectivity and Quality of Large Numbers of Linked Datasets , 2018, ACM J. Data Inf. Qual..

[15]  Andreas Harth,et al.  Optimized index structures for querying RDF from the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[16]  Jürgen Umbrich,et al.  Comparing data summaries for processing live queries over Linked Data , 2011, World Wide Web.

[17]  Hai Jin,et al.  TripleBit: a Fast and Compact System for Large Scale RDF Data , 2013, Proc. VLDB Endow..

[18]  Emmanuel S. Pilli,et al.  JOTR: Join-Optimistic Triple Reordering Approach for SPARQL Query Optimization on Big RDF Data , 2018, 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT).

[19]  Panos Kalnis,et al.  Lusail: A System for Querying Linked Data at Scale , 2017, Proc. VLDB Endow..

[20]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[21]  Wolfram Wöß,et al.  A Semantic Web middleware for Virtual Data Integration on the Web , 2008, ESWC.

[22]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.