A Scalability Metric for Parallel Computations on Large, Growing Datasets (like the Web)

One of the greatest challenges facing computations on data crawled from the Web is the (in)ability to scale to such large quantities of data. While some computations are less challenged by this than others, inference on the Semantic Web is certainly limited in this regard. Parallelism has been employed to scale inference to larger datasets, but evaluations of recent works have fallen back on common parallel computing metrics that do not apply to this specific scalability challenge. In this position paper, the name data scaling is given to this scalability challenge, and the metric growth efficiency is defined.

[1]  Jeff Z. Pan,et al.  SAOR: Template Rule Optimisations for Distributed Reasoning over 1 Billion Linked Data Triples , 2010, SEMWEB.

[2]  James A. Hendler,et al.  Scalable reduction of large datasets to interesting subsets , 2010, J. Web Semant..

[3]  Eric Goodman,et al.  Scalable in-memory RDFS closure on billions of triples. , 2010 .

[4]  Frank van Harmelen,et al.  Mind the data skew: distributed inferencing by speeddating in elastic regions , 2010, WWW '10.

[5]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[6]  Lionel M. Ni,et al.  Scalable Problems and Memory-Bounded Speedup , 1993, J. Parallel Distributed Comput..

[7]  Frank van Harmelen,et al.  Marvin: Distributed reasoning over large-scale Semantic Web data , 2009, J. Web Semant..

[8]  Manolis Koubarakis,et al.  RDFS Reasoning and Query Answering on Top of DHTs , 2008, SEMWEB.

[9]  Frank van Harmelen,et al.  A reasonable Semantic Web , 2010, Semantic Web.

[10]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[11]  Frank van Harmelen,et al.  OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples , 2010, ESWC.

[12]  James A. Hendler,et al.  Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples , 2009, SEMWEB.

[13]  Frank van Harmelen,et al.  Scalable Distributed Reasoning Using MapReduce , 2009, SEMWEB.

[14]  Viktor K. Prasanna,et al.  Parallel Inferencing for OWL Knowledge Bases , 2008, 2008 37th International Conference on Parallel Processing.