Semantic Index : Scalable Query Answering without Forward Chaining or Exponential Rewritings

Entailment regimes add support for rich inferences in SPARQL 1.1. This greatly facilitates the use of reasoning in applications. In this context, one special interest of the Semantic Web (SW) community is Ontology Based Data Access (OBDA), i.e., querying large volumes of assertional data through the vocabulary and semantics of ontologies. OBDA has recieved a lot of attention in the last years, however, while there have been advances on the theoretical side, e.g., the definition of OWL 2 QL, realizing efficient and scalable reasoning for large ontologies and large data sets is still problematic. In this context, the most widespread reasoning technique for query answering is the materialization of inferences using forward chaining. This technique has several advantages, e.g., all inferences are done off-line, it is relatively easy to implement, and it offers high-performance at query time. However, if the terminological part of the ontology is large, materialization may require a long time and may considerably increase the storage requirements of the application. These disadvantages can turn this technique undesirable or unfeasible in several relevant use cases. An alternative to materialization is query answering by query rewriting, in which all reasoning is done on-line. Query rewriting has often been promoted as the most efficient way to query large volumes of data. However, in practice we have not seen a widespread adoption of these techniques. The main reason is that the queries generated by rewriting are often too large or too complex; for example, in the case of large OWL 2 QL ontologies, rewritings often generate hundreds or thousands of subqueries. In this paper we present a technique that combines off-line and on-line reasoning to avoid the aforementioned issues of materialization and query rewriting and guaranteeing, in practice, minimal time and space for the construction of the triple store and fast query answering. We formulate the technique using RDBMS systems as the data backend; however, we note that the technique can easily be adapted to native triple stores. Likewise, in the following we focus on the OWL 2 QL direct entailment regime, however, the technique can also be used with the RDFS regime. Semantic Index. The core idea of the semantic index technique is to encode the entailed hierarchies of the terminology of the OWL 2 QL ontology, i.e., the TBox T , into numeric indexes that we assign to classes and properties. We use these values to insert the assertional data of the ontology, i.e., the ABox A, into the DB, and use range queries to retrieve the triples entailed by the hierarchies and the ABox assertions. This allows us to create triple repositories that are almost the size of the original data and already encode most of the semantics of the ontology. Combined with a simple rewriting technique, we are able to provide fast and scalable query answering for SPARQL 1.1 ABox queries under the OWL 2 QL entailment regime while preserving soundness and completeness. Our proposal is strongly related to techniques for managing large transitive relations in