Towards Formal Modeling and Verification of Cloud Architectures: A Case Study on Hadoop

Hadoop is a popular open source implementation of MapReduce, that has a number of prominent users including Yahoo!, Facebook, and Twitter. Though several works have focused on deploying algorithms on Hadoop MapReduce, research efforts into applying formal methods to prove the correctness of hadoop systems are limited. In this paper we propose a holistic approach to verify the correctness of hadoop systems using model checking techniques. We model Hadoop's parallel architecture to constraint it to valid start up ordering and identify and prove the benefits of data locality, deadlock-freeness and non-termination among others.

[1]  Sven Apel,et al.  Static type checking of Hadoop MapReduce programs , 2011, MapReduce '11.

[2]  Qin Li,et al.  Modeling MapReduce with CSP , 2009, 2009 Third IEEE International Symposium on Theoretical Aspects of Software Engineering.

[3]  Rajeev Gandhi,et al.  Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop , 2009, HotCloud.

[4]  Jun Sun,et al.  PAT: Towards Flexible Verification under Fairness , 2009, CAV.

[5]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[6]  Chengkai Li,et al.  New ideas track: testing mapreduce-style programs , 2011, ESEC/FSE '11.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[9]  Masami Hagiya,et al.  Using Coq in Specification and Program Extraction of Hadoop MapReduce Applications , 2011, SEFM.

[10]  Jun Sun,et al.  Integrating Specification and Programs for System Modeling and Verification , 2009, 2009 Third IEEE International Symposium on Theoretical Aspects of Software Engineering.

[11]  Jun Sun,et al.  Symbolic Model-Checking of Stateful Timed CSP Using BDD and Digitization , 2012, ICFEM.

[12]  Pramod Bhatotia,et al.  Brief announcement: modelling MapReduce for optimal execution in the cloud , 2010, PODC.