The Prototype for Implementation of Security Issue in Big Data Application using Hadoop Server

A large amount of data can be referred as BigData. A vast size of data requires special kind of methodology to process and store. BigData research consortium team developed a distributed server known as Hadoop Server, to divide and partition large data into multiple pieces for fast and efficient processing. Hadoop is an open source solution developed by Google Corporation for large data processing. It supports variety of components and distributed file system. MapReduce, Pig, Hive are the components used for efficient development of software, together with Hadoop Distributed File System which is responsible for storing and processing large data with multiple nodes. The complete study observes that advance level of processing is required for large data scale, thereby to accomplish level of concert. In order to circumvent problem of privacy leakage and access maintenance, an elucidated security model has been developed for BigData application. This paper describes the security issue along with its solution. The proposed solution is implemented with Hadoop server in single node and multinode environment.

[1]  Chun-Yu Wang,et al.  Federated MapReduce to Transparently Run Applications on Multicluster Environment , 2014, 2014 IEEE International Congress on Big Data.

[2]  Ron,et al.  The RSA Algorithm , 2009 .

[3]  Travis Mayberry,et al.  PIRMAP: Efficient Private Information Retrieval for MapReduce , 2013, Financial Cryptography.

[4]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[5]  Aris Gkoulalas-Divanis,et al.  Anonymizing Transaction Data to Eliminate Sensitive Inferences , 2010, DEXA.

[6]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Craig Gentry,et al.  Fully Homomorphic Encryption over the Integers , 2010, EUROCRYPT.

[8]  Jeffrey F. Naughton,et al.  Anonymization of Set-Valued Data via Top-Down, Local Generalization , 2009, Proc. VLDB Endow..

[9]  Sameesha Vs A Scalable Two Phase Top Down Specialization Approach For Data Anonymization Using Mapreduce On Cloud , 2017 .

[10]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[11]  David J. DeWitt,et al.  Workload-aware anonymization techniques for large-scale datasets , 2008, TODS.

[12]  Grigorios Loukides,et al.  A Parallel Method for Scalable Anonymization of Transaction Data , 2015, 2015 14th International Symposium on Parallel and Distributed Computing.

[13]  Yenumula B. Reddy Access Control Mechanisms in Big Data Processing , 2015 .

[14]  Aris Gkoulalas-Divanis,et al.  Efficient and flexible anonymization of transaction data , 2012, Knowledge and Information Systems.

[15]  Panos Kalnis,et al.  Local and global recoding methods for anonymizing set-valued data , 2010, The VLDB Journal.

[16]  D. DeWitt,et al.  K-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Philip S. Yu,et al.  Anonymizing transaction databases for publication , 2008, KDD.

[18]  Hiroyuki Sato,et al.  A Solution for Privacy Protection in MapReduce , 2012, 2012 IEEE 36th Annual Computer Software and Applications Conference.

[19]  Xianfeng Yang,et al.  A New Data Mining Algorithm based on MapReduce and Hadoop , 2014 .

[20]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..