Formalizing MapReduce with CSP

As a programming model, MapReduce is popularly and widely used in processing and generating large cluster of data sets distributed on large amount of machines. With its widespread use, its validity and other major properties need to be analyzed in a formal framework. In this paper, a formal model is presented using CSP method. We focus on the dominant parts of MapReduce and formalize them in detail. Through this formal model, the processing and function of each component can be clearly reflected. Moreover, we illustrate this formal model by an example computation. The result reflects the validity of MapReduce in some appropriate applications.

[1]  Ralf Lämmel,et al.  Google's MapReduce programming model - Revisited , 2007, Sci. Comput. Program..

[2]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[3]  A. W. Roscoe,et al.  Verifying Statemate Statecharts Using CSP and FDR , 2006, ICFEM.

[4]  David B. Skillicorn,et al.  The Bird-Meertens Formalism as a Parallel Model , 1993 .

[5]  Prashant Pandey,et al.  Cloud computing , 2010, ICWET.

[6]  A. W. Roscoe,et al.  Authenticating ad hoc networks by comparison of short digests , 2008, Inf. Comput..

[7]  A. W. Roscoe,et al.  Using CSP to Detect Errors in the TMN Protocol , 1997, IEEE Trans. Software Eng..

[8]  Andrew William Roscoe,et al.  The Theory and Practice of Concurrency , 1997 .

[9]  Jeffrey Dean,et al.  Keynote talk: Experiences with MapReduce, an abstraction for large-scale computation , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[11]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[12]  Andrew William Roscoe,et al.  Model-checking CSP , 1994 .

[13]  G. Fenu,et al.  An approach to a Cloud Computing network , 2008, 2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT).

[14]  David B. Skillicorn,et al.  Architecture-independent parallel computation , 1990, Computer.

[15]  Edward Y. Chang,et al.  Data management projects at Google , 2008, SGMD.

[16]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[17]  Michael Young Google Maps Mashups with Google Mapplets (Firstpress) , 2008 .

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Jeremy Gibbons An Introduction to the Bird−Meertens Formalism , 1994 .