Securing MapReduce Result Integrity via Verification-based Integrity Assurance Framework

MapReduce, a large-scale data processing paradigm, is gaining popularity. However, like other distributed computing frameworks, MapReduce suffers from the integrity assurance vulnerability: malicious workers in the MapReduce cluster could tamper with its computation result and thereby render the overall computation result inaccurate. Existing solutions are effective in defeating the malicious behavior of non-collusive workers, but are less effective in detecting collusive workers. In this paper, we propose the Verification-based Integrity Assurance Framework (VIAF). By using task replication and probabilistic result verification, VIAF can detect both non-collusive and collusive workers, even if the malicious workers dominate the environment. We have implemented VIAF on Hadoop, an open source MapReduce implementation. Our theoretical analysis and experimental result show that VIAF can achieve high job accuracy while imposing moderate performance overhead.

[1]  Ting Yu,et al.  SecureMR: A Service Integrity Assurance Framework for MapReduce , 2009, 2009 Annual Computer Security Applications Conference.

[2]  Cécile Germain,et al.  Grid result checking , 2005, CF '05.

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Wenliang Du,et al.  Uncheatable grid computing , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[5]  Foreword and Editorial International Journal of Grid Distribution Computing , .

[6]  P. Varalakshmi,et al.  Quiz-based trust model with optimized resource management in grid , 2008, 2008 13th Asia-Pacific Computer Systems Architecture Conference.

[7]  Philippe Golle,et al.  Secure Distributed Computing in a Commercial Environment , 2002, Financial Cryptography.

[8]  Ahmad-Reza Sadeghi,et al.  AmazonIA: when elasticity snaps back , 2011, CCS '11.

[9]  Robert Grimm,et al.  Ensuring Content Integrity for Untrusted Peer-to-Peer Content Distribution Networks , 2007, NSDI.

[10]  Luis F. G. Sarmenta Sabotage-tolerance mechanisms for volunteer computing systems , 2002, Future Gener. Comput. Syst..

[11]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[12]  Bruno Sousa,et al.  Sabotage-tolerance and trust management in desktop grid computing , 2007, Future Gener. Comput. Syst..

[13]  Abhishek Chandra,et al.  Adaptive Reputation-Based Scheduling on Unreliable Distributed Infrastructures , 2007, IEEE Transactions on Parallel and Distributed Systems.

[14]  Jinpeng Wei,et al.  VIAF: Verification-Based Integrity Assurance Framework for MapReduce , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[15]  Philippe Golle,et al.  Uncheatable Distributed Computations , 2001, CT-RSA.

[16]  Chris GauthierDickey,et al.  Result verification and trust-based scheduling in peer-to-peer grids , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[17]  Bernard J. Jaworski,et al.  E-Commerce , 2021, Strategic International Restaurant Development.

[18]  Mary Baker,et al.  Preserving peer replicas by rate-limited sampled voting , 2003, SOSP '03.