Balloon Fusion: SPARQL rewriting based on unified co-reference information

While Linked Open Data showed enormous increase in volume, yet there is no single point of access for querying the over 200 SPARQL repositories. In this paper we present Balloon Fusion, a SPARQL 1.1 rewriting and query federation service build on crawling and consolidating co-reference relationships in over 100 reachable Linked Data SPARQL Endpoints. The results of this process are 17.6M co-reference statements that have been clustered to 8.4M distinct semantic entities and are now accessible as download for further analysis. The proposed SPARQL rewriting performs a substitution of all URI occurrences with their synonyms combined with an automatic endpoint selection based on URI origin for a comprehensive query federation. While we show the technical feasibility, we also critically reflect the current status of the Linked Open Data cloud: although it is huge in size, access via SPARQL Endpoints is complicated in most cases due to missing quality of service.

[1]  Nur Aini Rakhmawati,et al.  On the Impact of Data Distribution in Federated SPARQL Queries , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[2]  Maribel Acosta,et al.  ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints , 2011, SEMWEB.

[3]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[4]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[5]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[6]  Richard Cyganiak,et al.  A relational algebra for SPARQL , 2005 .

[7]  Hugh Glaser,et al.  Managing URI Synonymity to Enable Consistent Reference on the Semantic Web , 2008, IRSW.

[8]  Hugh Glaser,et al.  Managing Co-reference on the Semantic Web , 2009, LDOW.

[9]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.

[10]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[11]  Jerry R. Hobbs Coherence and Coreference , 1979, Cogn. Sci..

[12]  Nektarios Gioldasis,et al.  Ontology Mapping and SPARQL Rewriting for Querying Federated RDF Data Sources - (Short Paper) , 2010, OTM Conferences.

[13]  Jürgen Umbrich,et al.  SPARQL Web-Querying Infrastructure: Ready for Action? , 2013, SEMWEB.

[14]  Steffen Staab,et al.  SchemEX - Efficient construction of a data catalogue by stream-based indexing of linked data , 2012, J. Web Semant..

[15]  Deborah L. McGuinness,et al.  SameAs Networks and Beyond: Analyzing Deployment Status and Implications of owl: sameAs in Linked Data , 2010, International Semantic Web Conference.

[16]  Deborah L. McGuinness,et al.  When owl: sameAs Isn't the Same: An Analysis of Identity in Linked Data , 2010, SEMWEB.

[17]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[18]  Hugh Glaser,et al.  SPARQL query rewriting for implementing data integration over linked data , 2010, EDBT '10.