Result Integrity Check for MapReduce Computation on Hybrid Clouds

Large scale adoption of MapReduce computations on public clouds is hindered by the lack of trust on the participating virtual machines, because misbehaving worker nodes can compromise the integrity of the computation result. In this paper, we propose a novel MapReduce framework, Cross Cloud MapReduce (CCMR), which overlays the MapReduce computation on top of a hybrid cloud: the master that is in control of the entire computation and guarantees result integrity runs on a private and trusted cloud, while normal workers run on a public cloud. In order to achieve high accuracy, CCMR proposes a result integrity check scheme on both the map phase and the reduce phase, which combines random task replication, random task verification, and credit accumulation, and CCMR strives to reduce the overhead by reducing cross-cloud communication. We implement our approach based on Apache Hadoop MapReduce and evaluate our implementation on Amazon EC2. Both theoretical and experimental analysis show that our approach can guarantee high result integrity in a normal cloud environment while incurring non-negligible performance overhead (e.g., when 16.7% workers are malicious, CCMR can guarantee at least 99.52% of accuracy with 33.6% of overhead when replication probability is 0.3 and the credit threshold is 50).

[1]  Ting Yu,et al.  SecureMR: A Service Integrity Assurance Framework for MapReduce , 2009, 2009 Annual Computer Security Applications Conference.

[2]  Kirk P. Arnett,et al.  The size of the IT job market , 2008, CACM.

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Helen J. Wang,et al.  Enabling Security in Cloud Storage SLAs with CloudProof , 2011, USENIX ATC.

[5]  Philippe Golle,et al.  Secure Distributed Computing in a Commercial Environment , 2002, Financial Cryptography.

[6]  Wenliang Du,et al.  Uncheatable grid computing , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[7]  Ari Juels,et al.  HAIL: a high-availability and integrity layer for cloud storage , 2009, CCS.

[8]  Tejaswi Redkar,et al.  Windows Azure Compute , 2011 .

[9]  Chris GauthierDickey,et al.  Result verification and trust-based scheduling in peer-to-peer grids , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[10]  Ahmad-Reza Sadeghi,et al.  AmazonIA: when elasticity snaps back , 2011, CCS '11.

[11]  Jinpeng Wei,et al.  VIAF: Verification-Based Integrity Assurance Framework for MapReduce , 2011, 2011 IEEE 4th International Conference on Cloud Computing.