Enhancing security and privacy protection for MapReduce processing: the initial simulation work flow

MapReduce programming model allows the processing of massive amount of data in parallel through clustering across a distributed system. The tasks for MapReduce have been categorized into areas which are data management and storage, data analytics, on line processing and security and privacy protection. For sensitive data uploaded by users, it must be protected from any unauthorized access to ensure the integrity, authenticity and privacy of the data. It is important that, data at rest, data in transit and nodes is managed securely by ad-dressing the elements of data security and privacy protection which are auditing, access control and privacy. The purpose of this study is to enhance the prominence of security and privacy protection for MapReduce model. An existing study is more specifically tailored to structure based requirements. Whilst, there is a need to continue finding solutions for MapReduce in better handling big data security and privacy protection concerning the unstructured data. This paper presents the initial workflow of the simulation set up of MapReduce processing using the Hadoop platform to demonstrate an enhancement for security and privacy protection access control by implementing Whitelist to control access in MapReduce processing.

[1]  Miriam A. M. Capretz,et al.  Data management in cloud environments: NoSQL and NewSQL data stores , 2013, Journal of Cloud Computing: Advances, Systems and Applications.

[2]  Tianbo Lu,et al.  Next Big Thing in Big Data: The Security of the ICT Supply Chain , 2013, 2013 International Conference on Social Computing.

[3]  Chonggang Wang,et al.  A cross-job framework for MapReduce scheduling , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[4]  Ning Zhang,et al.  Security issues relating to inadequate authentication in MapReduce applications , 2013, 2013 International Conference on High Performance Computing & Simulation (HPCS).

[5]  Xiaoyong Du,et al.  Beyond Simple Integration of RDBMS and MapReduce -- Paving the Way toward a Unified System for Big Data Analytics: Vision and Progress , 2012, 2012 Second International Conference on Cloud and Green Computing.

[6]  Ping Yang,et al.  A Sketch of Big Data Technologies , 2013, 2013 Seventh International Conference on Internet Computing for Engineering and Science.

[7]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[8]  G. T. Gangemi,et al.  Computer Security Basics , 2006 .

[9]  Ting Yu,et al.  SecureMR: A Service Integrity Assurance Framework for MapReduce , 2009, 2009 Annual Computer Security Applications Conference.

[10]  Raymond Gardiner Goss,et al.  Heading towards big data building a better data warehouse for more data, more speed, and more users , 2013, ASMC 2013 SEMI Advanced Semiconductor Manufacturing Conference.

[11]  Vrinda Tokekar,et al.  Prominence of MapReduce in Big Data Processing , 2014, 2014 Fourth International Conference on Communication Systems and Network Technologies.

[12]  Xinwen Fu,et al.  A cloud computing based system for cyber security management , 2015, Int. J. Parallel Emergent Distributed Syst..

[13]  Miriam A. M. Capretz,et al.  Challenges for MapReduce in Big Data , 2014, 2014 IEEE World Congress on Services.

[14]  Christopher L. Adamson,et al.  DFBIdb: A Software Package for Neuroimaging Data Management , 2010, Neuroinformatics.

[15]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[16]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[17]  Lavanya Ramakrishnan,et al.  Benchmarking MapReduce Implementations for Application Usage Scenarios , 2011, 2011 IEEE/ACM 12th International Conference on Grid Computing.

[18]  Christopher Garcia Demystifying MapReduce , 2013, Complex Adaptive Systems.

[19]  Peter P. Chen The entity-relationship model: toward a unified view of data , 1975, VLDB '75.

[20]  Deborah Russell,et al.  Computer security basics (3. ed.) , 1992 .

[21]  Xiaorong Li,et al.  Data Value Chain as a Service Framework: For Enabling Data Handling, Data Security and Data Analysis in the Cloud , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[22]  Jinjun Chen,et al.  Combining Top-Down and Bottom-Up: Scalable Sub-tree Anonymization over Big Data Using MapReduce on Cloud , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[23]  M Misbachul Huda,et al.  Data Modeling for Big Data , 2015 .

[24]  John A. Keane,et al.  Big Data Framework , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[25]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.