Linked 'Big' Data: Towards a Manifold Increase in Big Data Value and Veracity

The Web of Data is an increasingly rich source of information, which makes it useful for Big Data analysis. However, there is no guarantee that this Web of Data will provide the consumer with truthful and valuable information. Most research has focused on Big Data's Volume, Velocity, and Variety dimensions. Unfortunately, Veracity and Value, often regarded as the fourth and fifth dimensions, have been largely overlooked. In this paper we discuss the potential of Linked Data methods to tackle all five V's, and particularly propose methods for addressing the last two dimensions. We draw parallels between Linked and Big Data methods, and propose the application of existing methods to improve and maintain quality and address Big Data's veracity challenge.

[1]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[2]  Ian Horrocks,et al.  Reasoning Web. Semantic Technologies for Intelligent Data Access , 2013, Lecture Notes in Computer Science.

[3]  Martin Gaedke,et al.  Discovering and Maintaining Links on the Web of Data , 2009, SEMWEB.

[4]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[5]  Ankur Narang,et al.  Advanced Bloom Filter Based Algorithms for Efficient Approximate Data De-Duplication in Streams , 2012, ArXiv.

[6]  Daniele Braga,et al.  Querying RDF streams with C-SPARQL , 2010, SGMD.

[7]  Victoria L. Rubin,et al.  Veracity Roadmap: Is Big Data Objective, Truthful and Credible? , 2014 .

[8]  Krzysztof Janowicz,et al.  Linked Data, Big Data, and the 4th Paradigm , 2013, Semantic Web.

[9]  Sören Auer,et al.  Accessing Relational Data on the Web with SparqlMap , 2012, JIST.

[10]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[11]  Günther Pernul,et al.  Trust and Big Data: A Roadmap for Research , 2014, 2014 25th International Workshop on Database and Expert Systems Applications.

[12]  Heiner Stuckenschmidt,et al.  Ontology-Based Integration of Information - A Survey of Existing Approaches , 2001, OIS@IJCAI.

[13]  Diego Calvanese,et al.  Ontologies and Databases: The DL-Lite Approach , 2009, Reasoning Web.

[14]  Tom Heath,et al.  How to Publish Linked Data on the Web - Proposal for a Half-day Tutorial at ISWC2008 , 2008 .

[15]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[16]  Diego Calvanese,et al.  The Ontop Framework for Ontology Based Data Access , 2014, CSWS.

[17]  Haixun Wang,et al.  A Distributed Graph Engine for Web Scale RDF Data , 2013, Proc. VLDB Endow..

[18]  Ryutaro Ichise,et al.  Discovering Missing Links in Large-Scale Linked Data , 2013, ACIIDS.

[19]  Bhavani M. Thuraisingham,et al.  Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce , 2009, CloudCom.

[20]  Axel Polleres,et al.  Robust and scalable Linked Data reasoning incorporating provenance and trust annotations , 2011, J. Web Semant..

[21]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[22]  Robert Isele,et al.  LDIF - Linked Data Integration Framework , 2011, COLD.

[23]  Daniel P. Miranker,et al.  Ultrawrap: SPARQL execution on relational data , 2013, J. Web Semant..

[24]  Jürgen Umbrich,et al.  RDFS and OWL Reasoning for Linked Data , 2013, Reasoning Web.

[25]  Yolanda Gil,et al.  PROV Model Primer , 2012 .

[26]  Helmut Krcmar,et al.  Big Data , 2014, Wirtschaftsinf..

[27]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[28]  Jules Chevalier A Linked Data Reasoner in the Cloud , 2013, ESWC.

[29]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[30]  Leo Sauermann,et al.  Cool URIs for the semantic web , 2007 .