论文信息 - Towards In-Memory RDFS Entailment

Towards In-Memory RDFS Entailment

1 Motivation Massive publication efforts have enriched the Web with huge amounts of semantic data represented in RDF [7], and reasoning tasks at such scale are a formidable challenge. RDF Schema (RDFS) [6] defines the most simple inference in RDF introducing a vocabulary with predefined semantics to describe relationships such as typing of entities and hierarchy relations in classes and properties. This vocabulary allows one to infer new facts, not originally explicit in the RDF graph, by means of a process called RDFS entailment. Traditionally, two types of solutions tackle RDFS entailment. On the one hand, all facts which can be inferred from an RDF graph can be materialized and added to the graph. This is referred to as the graph closure, which allows to easily check if a triple is inferred from the graph. However, the closure can be of size quadratic in the size of the initial graph which is not a practical bound from a database point of view [9]. Although some approaches can compute huge closures on the basis of distributed systems [11, 12], the final (potentially massive) closure has to be still managed and queried, paying costly latencies. On the other hand, one could maintain the original graph and check the entailment on-demand. Unfortunately, these solutions have to pay a potentially large number of I/O accesses at huge scale [5], which disregard a broader adoption whenever a fast response prevails. This scenario claims for forms of lightweight entailment which could make reasoning feasible at Web scale. Solutions will necessarily involve finding space/ time tradeoffs that (i) save storage requirements, and (ii) minimize I/O costs. Some clever form of compression would address both issues which are closely related. In fact, an in-memory solution enables these two objectives to be achieved if the compressed data can be directly accessed without prior decompression, optimizing the memory footprint. This solution would dissuade to maintain graph closures and allows to design an efficient on demand algorithm which performs triple checking in main memory. Our ongoing work, described in the next section, follows the above ideas by compressing the RDF graph using RDF/HDT [3], a data structure known to assure a reduced memory footprint while providing fast triple pattern resolution in main memory [8, 4].

Miguel A. Martínez-Prieto | Javier D. Fernández | Claudio Gutiérrez | Jorge Pérez

[1] Axel Polleres,et al. Binary RDF representation for publication and exchange (HDT) , 2013, J. Web Semant..

[2] George H. L. Fletcher,et al. Efficient RDFS Entailment in External Memory , 2011, OTM Workshops.

[3] Frank van Harmelen,et al. OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples , 2010, ESWC.

[4] Gonzalo Navarro,et al. Succinct Trees in Practice , 2010, ALENEX.

[5] Javier D. Fernández. Binary RDF for scalable publishing, exchanging and consumption in the web of data , 2012, WWW.

[6] Jorge Pérez,et al. Simple and Efficient Minimal RDFS , 2009, J. Web Semant..

[7] Miguel A. Martínez-Prieto,et al. Exchange and Consumption of Huge RDF Data , 2012, ESWC.

[8] Marcelo Arenas,et al. nSPARQL: A navigational language for RDF , 2010, J. Web Semant..

[9] James A. Hendler,et al. Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples , 2009, SEMWEB.