SPARQL Query Answering on a Shared-nothing Architecture

The amount of Semantic Web data is outgrowing the capacity of Semantic Web stores. Similar to traditional databases, scaling up RDF stores is faced with a design dilemma: increase the number of nodes at the cost of increased complexity or use sophisticated, and expensive, hardware that can support large amounts of memory, high disk bandwidth and low seek latency. In this paper, we propose a technique to do distributed and join-less RDF query answering based on query pattern-driven indexing. To this end, we rst propose an extension of SPARQL to specify query patterns. These patterns are used to build a query-specic indexes using MapReduce, which are later queried using a NoSQL store. We provide a preliminary evaluation of our results using Hadoop and HBase, indicating that, for a predened query pattern, our system oers very high query throughput and fast response times.