Semantic Web Technologies and Big Data Infrastructures: SPARQL Federated Querying of Heterogeneous Big Data Stores

The ability to cross-link large scale data with each other and with structured Semantic Web data, and the ability to uniformly process Semantic Web and other data adds value to both the Semantic Web and to the Big Data community. This paper presents work in progress towards integrating Big Data infrastructures with Semantic Web technologies, allowing for the cross-linking and uniform retrieval of data stored in both Big Data infrastructures and Semantic Web data. The technical challenges involved in achieving this, pertain to both data and system interoperability: we need a way to make the semantics of Big Data explicit so that they can interlink and we need a way to make it transparent for the client applications to query federations of such heterogeneous systems. The paper presents an extension of the Semagrow federated SPARQL query processor that is able to seamlessly federated SPARQL endpoints, Cassandra databases, and Solr databases, and discusses future directions of this line of work.