Towards Formalizing of MapReduce

As a powerful distributed computing model, MapReduce has been widely used in many domains to process massive amounts of data. To ensure its correctness, one of the appropriate ways is formal methods. In this paper, we will propose a formal language to model MapReduce Programs based on our previous work. The language describes the MapReduce programming model from a view of files and blocks. So the details of data processing during a MapReduce computation can be clearly demonstrated. Certainly some parallel commands are introduced to reflect the parallelization of the computation. Based on this language, the correctness verification of the MapReduce programming model can be developed.

[1]  Peter W. O'Hearn,et al.  Resources, concurrency, and local reasoning , 2007 .

[2]  Diego Pérez Leándrez,et al.  Formal performance evaluation of the Map/Reduce framework within cloud computing , 2015, The Journal of Supercomputing.

[3]  Pramod Bhatotia,et al.  Brief announcement: modelling MapReduce for optimal execution in the cloud , 2010, PODC.

[4]  Panos Rondogiannis,et al.  Tagged Dataflow: a Formal Model for Iterative Map-Reduce , 2014, EDBT/ICDT Workshops.

[5]  Elena Troubitsyna,et al.  Formal Derivation of Distributed MapReduce , 2014, ABZ.

[6]  Stephen Brookes A semantics for concurrent separation logic , 2007, Theor. Comput. Sci..

[7]  Sergei Vassilvitskii,et al.  A model of computation for MapReduce , 2010, SODA '10.

[8]  Tony Hoare Operating System Techniques , 1972 .

[9]  Yu Huang,et al.  A modeling language to describe massive data storage management in cyber-physical systems , 2017, J. Parallel Distributed Comput..

[10]  Jon Feldman,et al.  On distributing symmetric streaming computations , 2008, SODA '08.

[11]  Mercedes G. Merayo,et al.  A formal framework to analyze cost and performance in Map-Reduce based applications , 2014, J. Comput. Sci..

[12]  M. Carmen Ruiz,et al.  Petri Nets Formalization of Map/Reduce Paradigm to Optimise the Performance-Cost Tradeoff , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[13]  Qin Li,et al.  Modeling MapReduce with CSP , 2009, 2009 Third IEEE International Symposium on Theoretical Aspects of Software Engineering.

[14]  Masami Hagiya,et al.  Using Coq in Specification and Program Extraction of Hadoop MapReduce Applications , 2011, SEFM.

[15]  GhemawatSanjay,et al.  The Google file system , 2003 .

[16]  Jun Sun,et al.  Towards Formal Modeling and Verification of Cloud Architectures: A Case Study on Hadoop , 2013, 2013 IEEE Ninth World Congress on Services.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  John C. Reynolds,et al.  Separation logic: a logic for shared mutable data structures , 2002, Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.

[19]  Qin Li,et al.  Formalizing MapReduce with CSP , 2010, 2010 17th IEEE International Conference and Workshops on Engineering of Computer Based Systems.

[20]  Tommaso Di Noia,et al.  A Computational Model for Mapreduce Job Flow , 2014, CILC.