SecureMR: A Service Integrity Assurance Framework for MapReduce

MapReduce has become increasingly popular as a powerful parallel data processing model. To deploy MapReduce as a data processing service over open systems such as service oriented architecture, cloud computing, and volunteer computing, we must provide necessary security mechanisms to protect the integrity of MapReduce data processing services. In this paper, we present SecureMR, a practical service integrity assurance framework for MapReduce. SecureMR consists of five security components, which provide a set of practical security mechanisms that not only ensure MapReduce service integrity as well as to prevent replay and Denial of Service (DoS) attacks, but also preserve the simplicity, applicability and scalability of MapReduce. We have implemented a prototype of SecureMR based on Hadoop, an open source MapReduce implementation. Our analytical study and experimental results show that SecureMR can ensure data processing service integrity while imposing low performance overhead.

[1]  Chris GauthierDickey,et al.  Result verification and trust-based scheduling in peer-to-peer grids , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Wenliang Du,et al.  Uncheatable grid computing , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[4]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[5]  S. Habib,et al.  Introducing map-reduce to high end computing , 2008, 2008 3rd Petascale Data Storage Workshop.

[6]  Barry Lawson,et al.  Toward an Optimal Redundancy Strategy for Distributed Computations , 2005, 2005 IEEE International Conference on Cluster Computing.

[7]  Cécile Germain,et al.  Grid result checking , 2005, CF '05.

[8]  Gustavo Alonso,et al.  Web Services: Concepts, Architectures and Applications , 2009 .

[9]  Philippe Golle,et al.  Secure Distributed Computing in a Commercial Environment , 2002, Financial Cryptography.

[10]  Philippe Golle,et al.  Uncheatable Distributed Computations , 2001, CT-RSA.

[11]  Andreas Haeberlen,et al.  PeerReview: practical accountability for distributed systems , 2007, SOSP.

[12]  Qing Zhang,et al.  A Framework for Identifying Compromised Nodes in Wireless Sensor Networks , 2008, TSEC.

[13]  Thomas Erl,et al.  Service-Oriented Architecture: Concepts, Technology, and Design , 2005 .

[14]  Michael Gertz,et al.  Authentic Third-party Data Publication , 2000, DBSec.

[15]  David Mazières,et al.  Fast and secure distributed read-only file system , 2000, TOCS.

[16]  Luis F. G. Sarmenta Sabotage-tolerance mechanisms for volunteer computing systems , 2002, Future Gener. Comput. Syst..

[17]  Bruno Sousa,et al.  Sabotage-tolerance and trust management in desktop grid computing , 2007, Future Gener. Comput. Syst..

[18]  Towards Reliable Reputations for Dynamic Networked Systems , 2008, 2008 Symposium on Reliable Distributed Systems.

[19]  Mikhail J. Atallah,et al.  Efficient Data Authentication in an Environment of Untrusted Third-Party Distributors , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[20]  Mudhakar Srivatsa,et al.  Securing publish-subscribe overlay services with EventGuard , 2005, CCS '05.

[21]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[22]  Ladislav Hluchý,et al.  Towards Large Scale Semantic Annotation Built on MapReduce Architecture , 2008, ICCS.

[23]  Geoffrey C. Fox,et al.  MapReduce for Data Intensive Scientific Analyses , 2008, 2008 IEEE Fourth International Conference on eScience.