Recent Developments on Security and Reliability in Large-Scale Data Processing with MapReduce

The demand to access to a large volume of data, distributed across hundreds or thousands of machines, has opened new opportunities in commerce, science, and computing applications. MapReduce is a paradigm that offers a programming model and an associated implementation for processing massive datasets in a parallel fashion, by using non-dedicated distributed computing hardware. It has been successfully adopted in several academic and industrial projects for Big Data Analytics. However, since such analytics is increasingly demanded within the context of mission-critical applications, security and reliability in MapReduce frameworks are strongly required in order to manage sensible information, and to obtain the right answer at the right time. In this paper, the authors present the main implementation of the MapReduce programming paradigm, provided by Apache with the name of Hadoop. They illustrate the security and reliability concerns in the context of a large-scale data processing infrastructure. They review the available solutions, and their limitations to support security and reliability within the context MapReduce frameworks. The authors conclude by describing the undergoing evolution of such solutions, and the possible issues for improvements, which could be challenging research opportunities for academic researchers.

[1]  Roberto Beraldi,et al.  Reliable and Timely Event Notification for Publish/Subscribe Services Over the Internet , 2014, IEEE/ACM Transactions on Networking.

[2]  Sara Bouchenak,et al.  Benchmarking Dependability of MapReduce Systems , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[3]  Bianca Schroeder,et al.  Understanding failures in petascale computers , 2007 .

[4]  Shweta Tripathi,et al.  Hadoop Based Defense Solution to Handle Distributed Denial of Service (DDoS) Attacks , 2013 .

[5]  Franck Cappello,et al.  Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities , 2009, Int. J. High Perform. Comput. Appl..

[6]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.

[7]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[8]  Youngseok Lee,et al.  Detecting DDoS attacks with Hadoop , 2011, CoNEXT '11 Student.

[9]  Fang-Yie Leu,et al.  Deriving Job Completion Reliability and Job Energy Consumption for a General MapReduce Infrastructure from Single-Job Perspective , 2013, 2013 27th International Conference on Advanced Information Networking and Applications Workshops.

[10]  Lorin M. Hitt,et al.  Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance? , 2011, ICIS 2011.

[11]  E.C. Lo,et al.  Security audit: a case study [information systems] , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[12]  Ting Yu,et al.  SecureMR: A Service Integrity Assurance Framework for MapReduce , 2009, 2009 Annual Computer Security Applications Conference.

[13]  Alysson Neves Bessani,et al.  On the Performance of Byzantine Fault-Tolerant MapReduce , 2013, IEEE Transactions on Dependable and Secure Computing.

[14]  Burton S. Kaliski A survey of encryption standards , 1993, IEEE Micro.

[15]  Mukesh K. Mohania,et al.  Cloud Computing and Big Data Analytics: What Is New from Databases Perspective? , 2012, BDA.

[16]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[17]  Mary Ellen Zurko,et al.  A Retrospective on the VAX VMM Security Kernel , 1991, IEEE Trans. Software Eng..

[18]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[19]  S. Rubika,et al.  A Novel Authentication Service for Hadoop in Cloud Environment , 2012, 2012 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM).

[20]  Zhen Guo,et al.  Design of a Security Framework on MapReduce , 2013, 2013 5th International Conference on Intelligent Networking and Collaborative Systems.

[21]  Sunil Karforma,et al.  A survey on digital signatures and its applications , 2012 .

[22]  Navendu Jain,et al.  Understanding network failures in data centers , 2011, SIGCOMM 2011.

[23]  Brent Waters,et al.  Attribute-based encryption for fine-grained access control of encrypted data , 2006, CCS '06.

[24]  William Stallings,et al.  Cryptography and Network Security: Principles and Practice , 1998 .

[25]  Schahram Dustdar,et al.  On analyzing and specifying concerns for data as a service , 2009, 2009 IEEE Asia-Pacific Services Computing Conference (APSCC).

[26]  Christos Doulkeridis,et al.  A survey of large-scale analytical query processing in MapReduce , 2013, The VLDB Journal.

[27]  Xu Chen,et al.  The data protection of mapreduce using homomorphic encryption , 2013, 2013 IEEE 4th International Conference on Software Engineering and Service Science.

[28]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[29]  Achim Streit,et al.  Enabling collaborative MapReduce on the Cloud with a single-sign-on mechanism , 2014, Computing.

[30]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[31]  Wang Jing,et al.  Research on Policy-based Access Control Model , 2009, 2009 International Conference on Networks Security, Wireless Communications and Trusted Computing.

[32]  Saeed Parsa,et al.  Survey on access control models , 2010, 2010 2nd International Conference on Future Computer and Communication.

[33]  Wu-chun Feng,et al.  Reliable MapReduce computing on opportunistic resources , 2011, Cluster Computing.

[34]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[35]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[36]  Thomas Hérault,et al.  Post-failure recovery of MPI communication capability , 2013, Int. J. High Perform. Comput. Appl..