Large-scale Linked Data Processing - Cloud Computing to the Rescue?

Processing large volumes of Linked Data requires sophisticated methods and tools. In the recent years we have mainly focused on systems based on relational databases and bespoke systems for Linked Data processing. Cloud computing offerings such as SimpleDB or BigQuery, and cloud-enabled NoSQL systems including Cassandra or CouchDB as well as frameworks such as Hadoop offer appealing alternatives along with great promises concerning performance, scalability and elasticity. In this paper we state a number of Linked Dataspecific requirements and review existing cloud computing offerings as well as NoSQL systems that may be used in a cloud computing setup, in terms of their applicability and usefulness for processing datasets on a large-scale.

[1]  Padmashree Ravindra,et al.  RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web , 2009, SEMWEB.

[2]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[3]  Jianling Sun,et al.  Scalable RDF store based on HBase and MapReduce , 2010, 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).

[4]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[5]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[6]  Robert L. Grossman,et al.  DataSpace: a data Web for the exploratory analysis and mining of data , 2002, Comput. Sci. Eng..

[7]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[8]  Robert L. Grossman,et al.  An overview of the Open Science Data Cloud , 2010, HPDC '10.

[9]  Valentin Zacharias,et al.  RDF on Cloud Number Nine , 2010 .

[10]  Sandra Goldbeck-Wood,et al.  Trinity , 2000, The Lancet.

[11]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[12]  D. Fensel,et al.  Architecture of the World Wide Web , Volume One , 2004 .

[13]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[14]  Sören Auer,et al.  The emerging web of linked data , 2011, ISWSA '11.

[15]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[16]  Peter Mika,et al.  Web Semantics in the Clouds , 2008, IEEE Intelligent Systems.

[17]  Isao Kojima,et al.  Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[18]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[19]  Michael Hausenblas,et al.  Exploiting Linked Data to Build Web Applications , 2009, IEEE Internet Computing.

[20]  Previous version: , 2004 .

[21]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[22]  Jürgen Umbrich,et al.  Comparing data summaries for processing live queries over Linked Data , 2011, World Wide Web.

[23]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[24]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing - "ABSTRACT" , 2009, SPAA '09.

[25]  David J. DeWitt,et al.  The Wisconsin Benchmark: Past, Present, and Future , 1991, The Benchmark Handbook.

[26]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[27]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.