Querying the Semantic Web

The explosive growth of RDF data on the Semantic Web drives the need for novel database techniques that can efficiently store and query large RDF datasets. To provide good performance and scalability of query processing, many existing RDF stores use an RDBMS as a backend to manage RDF data. The main challenge of this approach is the translation of RDF queries, formulated in the SPARQL query language, into their equivalent relational algebra expressions and SQL queries. In this book, we formalize a relational algebra based semantics of SPARQL, define the first provably semantics preserving SPARQL-to-SQL translation in the literature, describe a novel relational join, nested optional join, to efficiently evaluate SPARQL queries, and design the first relational RDF store, RDFProv, that is optimized for querying the Semantic Web of scientific workflow provenance. The book features a number of performance studies and comparisons with existing systems and approaches. The advanced query techniques in this book should be useful to students, instructors, computer scientists, and IT professionals, whose research interests are in the areas of semantic web, databases, and scientific workflows.