Using Coq in Specification and Program Extraction of Hadoop MapReduce Applications

Hadoop MapReduce is a framework for distributed computation on key-value pairs. The goal of this research is to verify actual running code of MapReduce applications. We first constructed an abstract model of MapReduce computation with the proof assistant Coq. In the model, mappers and reducers in MapReduce computation are modeled as functions in Coq, and a specification of a MapReduce application is expressed in terms of invariants among functions involving its mapper and reducer. The model also provides modular proofs of lemmas that do not depend on applications. To achieve the goal, we investigated the feasibility of two approaches. In one approach, we transformed verified mapper and reducer functions into Haskell programs and executed them under Hadoop Streaming. In the other approach, we verified JML annotations on Java programs of the mapper and reducer using Krakatoa, translated them into Coq axioms, and proved Coq specifications from them. In either approach, we were able to verify correctness of MapReduce applications that actually run on the Hadoop MapReduce framework.

[1]  Simon Marlow,et al.  Haskell 2010 Language Report , 2010 .

[2]  Sven Apel,et al.  Static type checking of Hadoop MapReduce programs , 2011, MapReduce '11.

[3]  Pierre Castéran,et al.  Interactive Theorem Proving and Program Development , 2004, Texts in Theoretical Computer Science An EATCS Series.

[4]  Rajeev Alur,et al.  A Temporal Logic of Nested Calls and Returns , 2004, TACAS.

[5]  David Aspinall,et al.  Formalising Java's Data Race Free Guarantee , 2007, TPHOLs.

[6]  Jacek Chrząszcz Implementing Modules in the Coq System , 2003, TPHOLs.

[7]  Frank D. Valencia,et al.  Formal Methods for Components and Objects , 2002, Lecture Notes in Computer Science.

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Claude Marché,et al.  The KRAKATOA tool for certificationof JAVA/JAVACARD programs annotated in JML , 2004, J. Log. Algebraic Methods Program..

[10]  Jimmy J. Lin,et al.  Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.

[11]  Qin Li,et al.  Formalizing MapReduce with CSP , 2010, 2010 17th IEEE International Conference and Workshops on Engineering of Computer Based Systems.

[12]  Gary T. Leavens,et al.  Beyond Assertions: Advanced Specification and Verification with JML and ESC/Java2 , 2005, FMCO.

[13]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[14]  Ralf Lämmel,et al.  Google's MapReduce programming model - Revisited , 2007, Sci. Comput. Program..

[15]  David Detlefs,et al.  Simplify: a theorem prover for program checking , 2005, JACM.

[16]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.