Web Semantics: Science, Services and Agents on the World Wide Web

a b s t r a c t In this paper we thoroughly cover the issue of blank nodes, which have been defined in RDF as 'existential variables'. We first introduce the theoretical precedent for existential blank nodes from first order logic and incomplete information in database theory. We then cover the different (and sometimes incompatible) treatment of blank nodes across the W3C stack of RDF-related standards. We present an empirical survey of the blank nodes present in a large sample of RDF data published on the Web (the BTC- 2012 dataset), where we find that 25.7% of unique RDF terms are blank nodes, that 44.9% of documents and 66.2% of domains featured use of at least one blank node, and that aside from one Linked Data domain whose RDF data contains many ''blank node cycles'', the vast majority of blank nodes form tree structures that are efficient to compute simple entailment over. With respect to the RDF-merge of the full data, we show that 6.1% of blank-nodes are redundant under simple entailment. The vast majority of non-lean cases are isomorphisms resulting from multiple blank nodes with no discriminating information being given within an RDF document or documents being duplicated in multiple Web locations. Although simple entailment is NP-complete and leanness-checking is coNP-complete, in computing this latter result, we demonstrate that in practice, real-world RDF graphs are sufficiently ''rich'' in ground information for problematic cases to be avoided by non-naive algorithms.

[1]  Markus Krötzsch,et al.  SPARQL beyond Subgraph Matching , 2010, SEMWEB.

[2]  Óscar Corcho,et al.  Semantics and Optimization of the SPARQL 1.1 Federation Extension , 2011, ESWC.

[3]  Ian Horrocks,et al.  Description logic programs: combining logic programs with description logic , 2003, WWW '03.

[4]  Isao Kojima,et al.  ADERIS: An Adaptive Query Processor for Joining Federated SPARQL Endpoints , 2011, OTM Conferences.

[5]  Jan van Leeuwen,et al.  Worst-case Analysis of Set Union Algorithms , 1984, JACM.

[6]  Jürgen Umbrich,et al.  Observing Linked Data Dynamics , 2013, ESWC.

[7]  Jeff Z. Pan,et al.  Resource Description Framework , 2020, Definitions.

[8]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[9]  Alberto O. Mendelzon,et al.  Foundations of semantic web databases , 2004, PODS.

[10]  Jean-François Baget,et al.  RDF Entailment as a Graph Homomorphism , 2005, SEMWEB.

[11]  Keith Beattie,et al.  Metrics for heterogeneous scientific workflows: A case study of an earthquake science application , 2011, Int. J. High Perform. Comput. Appl..

[12]  Asunción Gómez-Pérez,et al.  A Semantic Sensor Web for Environmental Decision Support Applications , 2011, Sensors.

[13]  Andreas Langegger Virtual data integration on the web: novel methods for accessing heterogeneous and distributed data with rich semantics , 2008, iiWAS.

[14]  Daniel P. Miranker,et al.  Ultrawrap: SPARQL execution on relational data , 2013, J. Web Semant..

[15]  Jorge Pérez,et al.  Simple and Efficient Minimal RDFS , 2009, J. Web Semant..

[16]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[17]  Vibhav Gogate,et al.  A Complete Anytime Algorithm for Treewidth , 2004, UAI.

[18]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[19]  Michael Kifer,et al.  Logical foundations of object-oriented and frame-based languages , 1995, JACM.

[20]  Kerry L. Taylor,et al.  Reasoning about Sensors and Compositions , 2009, SSN.

[21]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[22]  Ian Horrocks,et al.  SPARQL Query Answering over OWL Ontologies , 2011, ESWC.

[23]  Jürgen Umbrich,et al.  Towards a Dynamic Linked Data Observatory , 2012 .

[24]  Manfred Hauswirth,et al.  A Contextualised Cognitive Perspective for Linked Sensor Data - Short paper , 2010, SSN.

[25]  Amit P. Sheth,et al.  Semantic Sensor Web , 2008, IEEE Internet Computing.

[26]  Achim Rettinger,et al.  X-LiSA: Cross-lingual Semantic Annotation , 2014, Proc. VLDB Endow..

[27]  Dunja Mladenic,et al.  Exposing real world information for the web of things , 2011, IIWeb '11.

[28]  Ronald Fagin,et al.  Data exchange: getting to the core , 2003, PODS '03.

[29]  Jeremy J. Carroll,et al.  Signing RDF Graphs , 2003, SEMWEB.

[30]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[31]  Yannis Tzitzikas,et al.  Demonstrating Blank Node Matching and RDF/S Comparison Functions , 2012, International Semantic Web Conference.

[32]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[33]  Óscar Corcho,et al.  A provenance-aware Linked Data application for trip management and organization , 2011, I-Semantics '11.

[34]  Philipp Obermeier,et al.  Processing RIF and OWL2RL within DLVHEX , 2010, RR.

[35]  Axel Polleres,et al.  dRDF: Entailment for Domain-Restricted RDF , 2008, ESWC.

[36]  Boris Motik,et al.  HermiT: An OWL 2 Reasoner , 2014, Journal of Automated Reasoning.

[37]  Jeff Z. Pan,et al.  Short Paper: Addressing the Challenges of Semantic Citizen-Sensing , 2011, SSN.

[38]  Brendan D. McKay,et al.  Practical graph isomorphism, II , 2013, J. Symb. Comput..

[39]  Werner Kuhn A Functional Ontology of Observation and Measurement , 2009, GeoS.

[40]  Alberto Del Bimbo,et al.  A Distributed System for Multimedia Monitoring, Publishing and Retrieval , 2014, IRCDL.

[41]  Heiner Stuckenschmidt,et al.  Index structures and algorithms for querying distributed RDF repositories , 2004, WWW '04.

[42]  Mario Antonioletti,et al.  Integrating distributed data sources with OGSA–DAI DQP and Views , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[43]  Carole A. Goble,et al.  CaGrid Workflow Toolkit: A taverna based workflow tool for cancer grid , 2010, BMC Bioinformatics.

[44]  Kerry Taylor,et al.  Using explicit semantic representations for user programming of sensor devices , 2009 .

[45]  Carole A. Goble,et al.  Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data , 2008, BMC Bioinformatics.

[46]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[47]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[48]  Mike Jackson,et al.  Distributed Data Management with OGSA-DAI , 2011, Grid and Cloud Database Management.

[49]  Christoph Stasch,et al.  A RESTful proxy and data model for linked sensor data , 2013, Int. J. Digit. Earth.

[50]  Claudio Gutiérrez,et al.  SQL Nested Queries in SPARQL , 2010, AMW.

[51]  Stijn Vansummeren,et al.  What are real SPARQL queries like? , 2011, SWIM '11.

[52]  Günter Ladwig,et al.  FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.

[53]  Achim Rettinger,et al.  xLiD-Lexica: Cross-lingual Linked Data Lexica , 2014, LREC.

[54]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[55]  Jos de Bruijn,et al.  Logical Foundations of (e)RDF(S): Complexity and Reasoning , 2007, ISWC/ASWC.

[56]  Christoph Stasch,et al.  Semantic Enablement for Spatial Data Infrastructures , 2010, Trans. GIS.

[57]  Mark Hedges,et al.  A Data Research Infrastructure for the Arts and Humanities , 2010 .

[58]  Andrew Woolf,et al.  An Open Source Linked Data Framework for Publishing Environmental Data under the UK Location Strategy , 2011 .

[59]  Steffen Staab,et al.  Networked graphs: a declarative mechanism for SPARQL rules, SPARQL views and RDF data integration on the web , 2008, WWW.

[60]  Maribel Acosta,et al.  ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints , 2011, SEMWEB.

[61]  Axel Polleres,et al.  On Blank Nodes , 2011, SEMWEB.

[62]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[63]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[64]  Ladislav Hluchý,et al.  Data mining and integration for predicting significant meteorological phenomena , 2010, ICCS.

[65]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[66]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[67]  W. Quine From Stimulus to Science , 1995 .

[68]  Florian Probst,et al.  Giving Meaning to GI Web Service Descriptions , 2004, WSMAI.

[69]  Jeremy J. Carroll,et al.  OWL 2 Web Ontology Language RDF-Based Semantics , 2009 .

[70]  Claudio Gutiérrez,et al.  The Expressive Power of SPARQL , 2008, SEMWEB.

[71]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[72]  Gösta Grahne,et al.  The Problem of Incomplete Information in Relational Databases , 1991, Lecture Notes in Computer Science.

[73]  Ken Wenzel,et al.  Semantic Web Based Dynamic Energy Analysis and Forecasts in Manufacturing Engineering , 2011 .

[74]  Krzysztof Janowicz,et al.  Linking Sensor Data - Why, to What, and How? , 2010, SSN.

[75]  Yolanda Gil,et al.  Pegasus and the Pulsar Search: From Metadata to Execution on the Grid , 2003, PPAM.

[76]  Yarden Katz,et al.  Pellet: A practical OWL-DL reasoner , 2007, J. Web Semant..

[77]  Kay Römer,et al.  SPITFIRE: toward a semantic web of things , 2011, IEEE Communications Magazine.

[78]  Foued Jrad,et al.  Reference installation for the German grid initiative D-Grid , 2010 .

[79]  Krzysztof Janowicz,et al.  The Stimulus-Sensor-Observation Ontology Design Pattern and its Integration into the Semantic Sensor Network Ontology , 2010, SSN.

[80]  Boris Motik,et al.  OWL 2 Web Ontology Language: structural specification and functional-style syntax , 2008 .

[81]  James A. Hendler,et al.  Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples , 2009, SEMWEB.

[82]  Alon Y. Levy The Information Manifold Approach to Data Integration , 2007 .

[83]  George Percivall,et al.  Ogc® sensor web enablement:overview and high level achhitecture. , 2007 .

[84]  Günter Ladwig,et al.  SIHJoin: Querying Remote and Local Linked Data , 2011, ESWC.

[85]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[86]  P. Kelly A congruence theorem for trees. , 1957 .

[87]  Samuel R. Buss,et al.  On Herbrand's Theorem , 1994, LCC.

[88]  Richard Hull,et al.  Managing semantic heterogeneity in databases: a theoretical prospective , 1997, PODS.

[89]  François Scharffe,et al.  SPARQL++ for Mapping Between RDF Vocabularies , 2007, OTM Conferences.

[90]  Payam M. Barnaghi,et al.  Publishing Linked Sensor Data , 2010, SSN.

[91]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[92]  Laura M. Haas,et al.  The Garlic project , 1996, SIGMOD '96.

[93]  Aldo Gangemi,et al.  Ontology Design Patterns for Semantic Web Content , 2005, SEMWEB.

[94]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[95]  Stefan Decker,et al.  Mapping between RDF and XML with XSPARQL , 2012, Journal on Data Semantics.

[96]  Jürgen Umbrich,et al.  Towards Dataset Dynamics: Change Frequency of Linked Open Data Sources , 2010, LDOW.

[97]  Volker Haarslev,et al.  The RacerPro knowledge representation and reasoning system , 2012, Semantic Web.

[98]  Amit P. Sheth,et al.  A Survey of the Semantic Specification of Sensors , 2009, SSN.

[99]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[100]  Ewa Deelman,et al.  A Cloud-based Dynamic Workflow for Mass Spectrometry Data Analysis , 2011, 2011 IEEE Seventh International Conference on eScience.

[101]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[102]  Frank van Harmelen,et al.  Scalable Distributed Reasoning Using MapReduce , 2009, SEMWEB.

[103]  Asunción Gómez-Pérez,et al.  A Semantically Enabled Service Architecture for Mashups over Streaming and Stored Data , 2011, ESWC.

[104]  Ian Horrocks,et al.  Combining logic programs with description logics , 2003, The Web Conference.

[105]  Daniel Nüst,et al.  Semantically-Enabled Sensor Plug & Play for the Sensor Web , 2011, Sensors.

[106]  Andrea Calì,et al.  Datalog+/-: A Family of Logical Knowledge Representation and Query Languages for New Applications , 2010, 2010 25th Annual IEEE Symposium on Logic in Computer Science.

[107]  Ian Horrocks,et al.  Optimizing Terminological Reasoning for Expressive Description Logics , 2007, Journal of Automated Reasoning.

[108]  Bernardo Cuenca Grau,et al.  OWL 2 Web Ontology Language: Profiles , 2009 .

[109]  Philip J. Fleming,et al.  How not to lie with statistics: the correct way to summarize benchmark results , 1986, CACM.