Privacy-Preserving Layer over MapReduce on Cloud

Cloud computing provides powerful and economical infrastructural resources for cloud users to handle ever-increasing Big Data with data-processing frameworks such as MapReduce. Based on cloud computing, the MapReduce framework has been widely adopted to process huge-volume data sets by various companies and organizations due to its salient features. Nevertheless, privacy concerns in MapReduce are aggravated because the privacy-sensitive information scattered among various data sets can be recovered with more ease when data and computational power are considerably abundant. Existing approaches employ techniques like access control or encryption to protect privacy in data processed by MapReduce. However, such techniques fail to preserve data privacy cost-effectively in some common scenarios where data are processed for data analytics, mining and sharing on cloud. As such, we propose a flexible, scalable, dynamical and costeffective privacy-preserving layer over the MapReduce framework in this paper. The layer ensures data privacy preservation and data utility under the given privacy requirements before data are further processed by subsequent MapReduce tasks. A corresponding prototype system is developed for the privacy-preserving layer as well.

[1]  Michael D. Ernst,et al.  The HaLoop approach to large-scale iterative data analysis , 2012, The VLDB Journal.

[2]  Xiao Liu,et al.  A cost-effective strategy for intermediate data storage in scientific cloud workflow systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[3]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[4]  Jianliang Xu,et al.  Processing private queries over untrusted data cloud through privacy homomorphism , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[5]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[6]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.

[8]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[9]  Debmalya Panigrahi,et al.  Provenance views for module privacy , 2010, PODS.

[10]  Chen Li,et al.  Inside "Big Data management": ogres, onions, or parfaits? , 2012, EDBT '12.

[11]  Roberto Di Pietro,et al.  PRISM - Privacy-Preserving Search in MapReduce , 2012, Privacy Enhancing Technologies.

[12]  Ting Yu,et al.  SecureMR: A Service Integrity Assurance Framework for MapReduce , 2009, 2009 Annual Computer Security Applications Conference.

[13]  Alekh Jindal,et al.  Hadoop++ , 2010 .

[14]  Jian Pei,et al.  Maintaining K-Anonymity against Incremental Updates , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[15]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[16]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[17]  Yang Xiao,et al.  Accountable MapReduce in cloud computing , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[18]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[19]  Dr B Santhosh Kumar Santhosh Balan,et al.  Closeness : A New Privacy Measure for Data Publishing , 2022 .

[20]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[21]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[22]  Theodore Y. Ts'o,et al.  Kerberos: an authentication service for computer networks , 1994, IEEE Communications Magazine.

[23]  Kyungho Jeon,et al.  The HybrEx Model for Confidentiality and Privacy in Cloud Computing , 2011, HotCloud.

[24]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[25]  XiaoFeng Wang,et al.  Sedic: privacy-aware data intensive computing on hybrid clouds , 2011, CCS '11.

[26]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Benjamin C. M. Fung,et al.  Centralized and Distributed Anonymization for High-Dimensional Healthcare Data , 2010, TKDD.

[28]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[29]  Surajit Chaudhuri,et al.  What next?: a half-dozen data management research goals for big data and the cloud , 2012, PODS '12.

[30]  Pramod Bhatotia,et al.  Incoop: MapReduce for incremental computations , 2011, SoCC.

[31]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[32]  Ben Y. Zhao,et al.  Silverline: toward data confidentiality in storage-intensive cloud applications , 2011, SoCC.

[33]  Xiao Liu,et al.  On-demand minimum cost benchmarking for intermediate dataset storage in scientific cloud workflow systems , 2011, J. Parallel Distributed Comput..

[34]  Ming Li,et al.  Authorized Private Keyword Search over Encrypted Data in Cloud Computing , 2011, 2011 31st International Conference on Distributed Computing Systems.

[35]  Jinjun Chen,et al.  A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-Effective Privacy Preserving of Intermediate Data Sets in Cloud , 2013, IEEE Transactions on Parallel and Distributed Systems.