An Efficient and Scalable Privacy Preserving Algorithm for Big Data and Data Streams

Abstract A vast amount of valuable data is produced and is becoming available for analysis as a result of advancements in smart cyber-physical systems. The data comes from various sources, such as healthcare, smart homes, smart vehicles, and often includes private, potentially sensitive information that needs appropriate sanitization before being released for analysis. The incremental and fast nature of data generation in these systems necessitates scalable privacy-preserving mechanisms with high privacy and utility. However, privacy preservation often comes at the expense of data utility. We propose a new data perturbation algorithm, SEAL (Secure and Efficient data perturbation Algorithm utilizing Local differential privacy), based on Chebyshev interpolation and Laplacian noise, which provides a good balance between privacy and utility with high efficiency and scalability. Empirical comparisons with existing privacy-preserving algorithms show that SEAL excels in execution speed, scalability, accuracy, and attack resistance. SEAL provides flexibility in choosing the best possible privacy parameters, such as the amount of added noise, which can be tailored to the domain and dataset.

[1]  Ninghui Li,et al.  Privacy at Scale: Local Dierential Privacy in Practice , 2018 .

[2]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[3]  Chedy Raïssi,et al.  Distributed Privacy Preserving Data Collection , 2011, DASFAA.

[4]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[5]  Nikos Parlavantzas,et al.  Privacy Aware on-Demand Resource Provisioning for IoT Data Processing , 2015, IoT 360.

[6]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[7]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[8]  Yin Yang,et al.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy , 2016, CCS.

[9]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[10]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[11]  Mohammad Abdur Razzaque,et al.  A comprehensive review on privacy preserving data mining , 2015, SpringerPlus.

[12]  Erhard Rahm,et al.  Privacy-Preserving Record Linkage for Big Data: Current Approaches and Research Challenges , 2017, Handbook of Big Data Technologies.

[13]  Jimeng Sun,et al.  Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[14]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Elisa Bertino,et al.  A Survey of Quantification of Privacy Preserving Data Mining Algorithms , 2008, Privacy-Preserving Data Mining.

[16]  James Harland,et al.  Pacific Asia Conference on Information Systems ( PACIS ) 7-15-2012 μ-Fractal Based Data Perturbation Algorithm For Privacy Protection , 2013 .

[17]  Qinghua Li,et al.  Achieving k-anonymity in privacy-aware location-based services , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[18]  Yogesh L. Simmhan,et al.  Benchmarking Distributed Stream Processing Platforms for IoT Applications , 2016, TPCTC.

[19]  Jin Li,et al.  Towards Privacy-Preserving Storage and Retrieval in Multiple Clouds , 2017, IEEE Transactions on Cloud Computing.

[20]  T. J. Rivlin The Chebyshev polynomials , 1974 .

[21]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[22]  Cynthia Dwork The Differential Privacy Frontier , 2009 .

[23]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[24]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[25]  陈永武 α , 1995 .

[26]  Latifur Khan,et al.  IoT Big Data Stream Mining , 2016, KDD.

[27]  Keke Gai,et al.  Privacy-Aware Adaptive Data Encryption Strategy of Big Data in Cloud Computing , 2016, 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud).

[28]  Kian-Lee Tan,et al.  CASTLE: Continuously Anonymizing Data Streams , 2011, IEEE Transactions on Dependable and Secure Computing.

[29]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[30]  Florian Kerschbaum,et al.  Searchable Encryption to Reduce Encryption Degradation in Adjustably Encrypted Databases , 2017, DBSec.

[31]  Pierre Comon,et al.  How fast is FastICA? , 2006, 2006 14th European Signal Processing Conference.

[32]  Edgar R. Weippl,et al.  Security Challenges in Cyber-Physical Production Systems , 2018, SWQD.

[34]  Philip S. Yu,et al.  On static and dynamic methods for condensation-based privacy-preserving data mining , 2008, TODS.

[35]  Jian Pei,et al.  Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud , 2015, IEEE Transactions on Computers.

[36]  Ling Liu,et al.  A Random Rotation Perturbation Approach to Privacy Preserving Data Classification , 2005 .

[37]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[38]  Philip S. Yu,et al.  Can the Utility of Anonymized Data be Used for Privacy Breaches? , 2009, TKDD.

[39]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[40]  Zhou Su,et al.  An Efficient and Fine-Grained Big Data Access Control Scheme With Privacy-Preserving Policy , 2017, IEEE Internet of Things Journal.

[41]  James Alan Fox,et al.  Randomized Response and Related Methods: Surveying Sensitive Data , 2015 .

[42]  Y. Chen [The change of serum alpha 1-antitrypsin level in patients with spontaneous pneumothorax]. , 1995, Zhonghua jie he he hu xi za zhi = Zhonghua jiehe he huxi zazhi = Chinese journal of tuberculosis and respiratory diseases.

[43]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[44]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[45]  Helen Gill,et al.  Cyber-Physical Systems , 2019, 2019 IEEE International Conference on Mechatronics (ICM).

[46]  Krishna P. Gummadi,et al.  On Profile Linkability despite Anonymity in Social Media Systems , 2016, WPES@CCS.

[47]  J. Mason,et al.  Integration Using Chebyshev Polynomials , 2003 .

[48]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[49]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[50]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[51]  Xintao Wu,et al.  Using Randomized Response for Differential Privacy Preserving Data Collection , 2016, EDBT/ICDT Workshops.

[52]  Huseyin Polat,et al.  A survey: deriving private information from perturbed data , 2015, Artificial Intelligence Review.

[53]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[54]  J. Domingo-Ferrer,et al.  Steered Microaggregation: A Unified Primitive for Anonymization of Data Sets and Data Streams , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[55]  Yang Xiao,et al.  Cyber Security and Privacy Issues in Smart Grids , 2012, IEEE Communications Surveys & Tutorials.

[56]  Cynthia Dwork,et al.  The Differential Privacy Frontier (Extended Abstract) , 2009, TCC.

[57]  Abdul Razaque,et al.  Triangular data privacy-preserving model for authenticating all key stakeholders in a cloud environment , 2016, Comput. Secur..

[58]  Walid G. Aref,et al.  Scheduling for shared window joins over data streams , 2003, VLDB.

[59]  D. Liu,et al.  Efficient Data Perturbation for Privacy Preserving and Accurate Data Stream Mining , 2018, Pervasive Mob. Comput..

[60]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[61]  Elisa Bertino Data privacy for IoT systems: Concepts, approaches, and research directions , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[62]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[63]  Gary M. Weiss,et al.  Biometric Authentication and Verification for Medical Cyber Physical Systems , 2018, Electronics.

[64]  Elaine Shi,et al.  Differentially Private Continual Monitoring of Heavy Hitters from Distributed Streams , 2012, Privacy Enhancing Technologies.

[65]  Siddharth Sridhar,et al.  Cyber–Physical System Security for the Electric Power Grid , 2012, Proceedings of the IEEE.

[66]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[67]  Ljiljana Brankovic,et al.  Data Swapping: Balancing Privacy against Precision in Mining for Logic Rules , 1999, DaWaK.

[68]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[69]  Laurence T. Yang,et al.  Privacy Preserving Deep Computation Model on Cloud for Big Data Feature Learning , 2016, IEEE Transactions on Computers.

[70]  Kato Mivule,et al.  A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Using Machine Learning Classification as a Gauge , 2013, Complex Adaptive Systems.

[71]  Qian Zhang,et al.  Outsourcing high-dimensional healthcare data to cloud with personalized privacy preservation , 2015, Comput. Networks.

[72]  Keke Chen,et al.  Under Consideration for Publication in Knowledge and Information Systems Geometric Data Perturbation for Privacy Preserving Outsourced Data Mining , 2010 .

[73]  Buqing Cao,et al.  Scheduling workflows with privacy protection constraints for big data applications on cloud , 2020, Future Gener. Comput. Syst..

[74]  Jianqing Zhang,et al.  Performance evaluation of Attribute-Based Encryption: Toward data privacy in the IoT , 2014, 2014 IEEE International Conference on Communications (ICC).

[75]  Bharat K. Bhargava,et al.  Consumer Oriented Privacy Preserving Access Control for Electronic Health Records in the Cloud , 2016, 2016 IEEE 9th International Conference on Cloud Computing (CLOUD).

[76]  Sushil Jajodia,et al.  Information disclosure under realistic assumptions: privacy versus optimality , 2007, CCS '07.

[77]  Abdul Razaque,et al.  Privacy preserving model: a new scheme for auditing cloud stakeholders , 2017, Journal of Cloud Computing.

[78]  D. C. Howell Fundamental Statistics for the Behavioral Sciences , 1985 .

[79]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[80]  Dapeng Wu,et al.  Scalable privacy-preserving big data aggregation mechanism , 2016 .

[81]  Jian Pei,et al.  Privacy-Preserving Data Stream Classification , 2008, Privacy-Preserving Data Mining.

[82]  M.A.P. Chamikara,et al.  Efficient privacy preservation of big data for accurate data mining , 2019, Inf. Sci..

[83]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[84]  Rathindra Sarathy,et al.  A General Additive Data Perturbation Method for Database Security , 1999 .

[85]  Yevgeni Koucheryavy,et al.  IoT Use Cases in Healthcare and Tourism , 2015, 2015 IEEE 17th Conference on Business Informatics.

[86]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[87]  Yuan Zhang,et al.  On Designing Satisfaction-Ratio-Aware Truthful Incentive Mechanisms for $k$ -Anonymity Location Privacy , 2016, IEEE Transactions on Information Forensics and Security.

[88]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[89]  Thomas Brox,et al.  Maximum Likelihood Estimation , 2019, Time Series Analysis.

[90]  Keke Chen,et al.  Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation , 2012, IEEE Transactions on Knowledge and Data Engineering.

[91]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[92]  Stavros Papadopoulos,et al.  Differentially Private Event Sequences over Infinite Streams , 2014, Proc. VLDB Endow..