Privacy Preserving Big Data Publishing: Challenges, Techniques, and Architectures

Data privacy plays a noteworthy part in today’s digital world where information is gathered at exceptional rates from different sources. Privacy preserving data publishing refers to the process of publishing personal data without questioning the privacy of individuals in any manner. A variety of approaches have been devised to forfend consumer privacy by applying traditional anonymization mechanisms. But these mechanisms are not well suited for Big Data, as the data which is generated nowadays is not just structured in manner. The data which is generated at very high velocities from various sources includes unstructured and semi-structured information, and thus becomes very difficult to process using traditional mechanisms. This chapter focuses on the various challenges with Big Data, PPDM and PPDP techniques for Big Data and how well it can be scaled for processing both historical and real-time data together using Lambda architecture. A distributed framework for privacy preservation in Big Data by combining Natural language processing techniques is also proposed in this chapter. Privacy Preserving Big Data Publishing: Challenges, Techniques, and Architectures

[1]  Elisa Bertino,et al.  Mask: a system for privacy-preserving policy-based access to published content , 2010, SIGMOD Conference.

[2]  Gunasekaran Manogaran,et al.  Big Data Security Framework for Distributed Cloud Data Centers , 2017 .

[3]  Seog Park,et al.  Hiding a Needle in a Haystack: Privacy Preserving Apriori algorithm inMapReduce Framework , 2014, PSBD '14.

[4]  Sylvia L. Osborn,et al.  FAANST: Fast Anonymizing Algorithm for Numerical Streaming DaTa , 2010, DPM/SETOP.

[5]  Ernesto Damiani,et al.  A Discussion of Privacy Challenges in User Profiling with Big Data Techniques: The EEXCESS Use Case , 2013, 2013 IEEE International Congress on Big Data.

[6]  Beng Chin Ooi,et al.  Anonymizing Streaming Data for Privacy Protection , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[8]  Jemal H. Abawajy,et al.  Privacy models for big data: a survey , 2015, Int. J. Big Data Intell..

[9]  Jinjun Chen,et al.  DLSeF , 2016, ACM Trans. Embed. Comput. Syst..

[10]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[11]  Erton Boci,et al.  A novel big data architecture in support of ADS-B data analytic , 2015, 2015 Integrated Communication, Navigation and Surveillance Conference (ICNS).

[12]  Yonghong Xie,et al.  A parallel algorithm PMASK based on privacy-preserving data mining , 2012, 2012 International Symposium on Instrumentation & Measurement, Sensor Network and Automation (IMSNA).

[13]  Louise Corti,et al.  Confidentiality and Informed Consent: Issues for Consideration in the Preservation of and Provision of Access to Qualitative Data Archives , 2000 .

[14]  Rajeev Motwani,et al.  Auditing SQL Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Prerna Mahajan,et al.  Big Data Security , 2016 .

[16]  Jinjun Chen,et al.  A security framework in G-Hadoop for big data computing across distributed Cloud data centres , 2014, J. Comput. Syst. Sci..

[17]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[18]  Shuguo Han,et al.  Privacy Preserving Support Vector Machine Using Non-linear Kernels on Hadoop Mahout , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[19]  Alexey V. Vashkevich,et al.  Privacy-preserving clustering using C-means , 2015, 2015 International Siberian Conference on Control and Communications (SIBCON).

[20]  Mahdi Niamanesh,et al.  ScadiBino: An effective MapReduce-based association rule mining method , 2014, ICEC '14.

[21]  Thomas Seidl,et al.  Preserving privacy of moving objects via temporal clustering of spatio-temporal data streams , 2011, SPRINGL '11.

[22]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[23]  Juhnyoung Lee,et al.  Big Data architecture for IT incident management , 2014, Proceedings of 2014 IEEE International Conference on Service Operations and Logistics, and Informatics.

[24]  Sun Wei,et al.  Association rule mining algorithm based on privacy preserving , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).

[25]  Panos Kalnis,et al.  SABRE: a Sensitive Attribute Bucketization and REdistribution framework for t-closeness , 2011, The VLDB Journal.

[26]  Lakshmish Ramaswamy,et al.  Towards a Quality-centric Big Data Architecture for Federated Sensor Services , 2013, 2013 IEEE International Congress on Big Data.

[27]  Jin-Long Wang,et al.  An Incremental Algorithm for Mining Privacy-Preserving Frequent Itemsets , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[28]  Basit Shafiq,et al.  A Random Decision Tree Framework for Privacy-Preserving Data Mining , 2014, IEEE Transactions on Dependable and Secure Computing.

[29]  Zhenyu He,et al.  Protecting Data Privacy from Being Inferred from High Dimensional Correlated Data , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[30]  Lei Shi,et al.  A Framework for Big Data Security Analysis and the Semantic Technology , 2016, 2016 6th International Conference on IT Convergence and Security (ICITCS).

[31]  Anitha S. Pillai,et al.  An intelligent framework for protecting privacy of individuals empirical evaluations on data mining classification , 2014, 2014 14th International Conference on Hybrid Intelligent Systems.

[32]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[33]  Muhammad Shiraz,et al.  Big Data: Survey, Technologies, Opportunities, and Challenges , 2014, TheScientificWorldJournal.

[34]  Gennady Agre,et al.  On speeding up the implementation of nearest neighbour search and classification , 2015, CompSysTech '15.

[35]  Mohammad Abdur Razzaque,et al.  A comprehensive review on privacy preserving data mining , 2015, SpringerPlus.

[36]  Thomas J. Hacker,et al.  A new approach for accurate distributed cluster analysis for Big Data: competitive K-Means , 2014, Int. J. Big Data Intell..

[37]  Keke Gai,et al.  Security-Aware Efficient Mass Distributed Storage Approach for Cloud Systems in Big Data , 2016, 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS).

[38]  Chunxiao Jiang,et al.  Information Security in Big Data: Privacy and Data Mining , 2014, IEEE Access.

[39]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[40]  Alina Campan,et al.  Data and Structural k-Anonymity in Social Networks , 2009, PinKDD.

[41]  NamUk Kim,et al.  Attribute Relationship Evaluation Methodology for Big Data Security , 2013, 2013 International Conference on IT Convergence and Security (ICITCS).

[42]  Zhikui Chen,et al.  A Universal Storage Architecture for Big Data in Cloud Environment , 2013, 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing.

[43]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[44]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[45]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[46]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[47]  Maguelonne Teisseire,et al.  Privacy preserving sequential pattern mining in distributed databases , 2006, CIKM '06.

[48]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[49]  M. Saravanan,et al.  Exploring new privacy approaches in a scalable classification framework , 2014, 2014 International Conference on Data Science and Advanced Analytics (DSAA).

[50]  Li Bing,et al.  A Fuzzy Logic Approach for Opinion Mining on Large Scale Twitter Data , 2014, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing.

[51]  Anitha S. Pillai,et al.  Adaptive Utility-based Anonymization Model: Performance Evaluation on Big Data Sets , 2015 .

[52]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[53]  Refik Molva,et al.  Privacy preserving social networking through decentralization , 2009, 2009 Sixth International Conference on Wireless On-Demand Network Systems and Services.