Efficient privacy preservation of big data for accurate data mining

Abstract Computing technologies pervade physical spaces and human lives, and produce a vast amount of data that is available for analysis. However, there is a growing concern that potentially sensitive data may become public if the collected data are not appropriately sanitized before being released for investigation. Although there are more than a few privacy-preserving methods available, they are not efficient, scalable, or have problems with data utility, or privacy. This paper addresses these issues by proposing an efficient and scalable nonreversible perturbation algorithm, PABIDOT, for privacy preservation of big data via optimal geometric transformations. PABIDOT was tested for efficiency, scalability, attack resistance, and accuracy using nine datasets and five classification algorithms. Experiments show that PABIDOT excels in execution speed, scalability, attack resistance, and accuracy in large-scale privacy-preserving data classification when compared with two other, related privacy-preserving algorithms.

[1]  Florian Kerschbaum,et al.  Searchable Encryption to Reduce Encryption Degradation in Adjustably Encrypted Databases , 2017, DBSec.

[2]  Hwang-Bin Ryou,et al.  Anomaly Detection Scheme Using Data Mining in Mobile Environment , 2003, ICCSA.

[3]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[4]  Ling Liu,et al.  A Random Rotation Perturbation Approach to Privacy Preserving Data Classification , 2005 .

[5]  Josep Domingo-Ferrer,et al.  Big Data Privacy: Challenges to Privacy Principles and Models , 2015, Data Science and Engineering.

[6]  Matjaz Perc,et al.  Grand Challenges in Social Physics: In Pursuit of Moral Behavior , 2018, Front. Phys..

[7]  Alan W. Paeth Graphics Gems V: MacIntosh Versiion , 1995 .

[8]  Xiaolei Dong,et al.  PPDM: A Privacy-Preserving Protocol for Cloud-Assisted e-Healthcare Systems , 2015, IEEE Journal of Selected Topics in Signal Processing.

[9]  Francesco Buccafurri,et al.  A Threat to Friendship Privacy in Facebook , 2016, CD-ARES.

[10]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  D. Liu,et al.  Efficient Data Perturbation for Privacy Preserving and Accurate Data Stream Mining , 2018, Pervasive Mob. Comput..

[12]  Dirk Helbing,et al.  Saving Human Lives: What Complexity Science and Information Systems can Contribute , 2014, Journal of statistical physics.

[13]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[14]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[15]  Raymond Chi-Wing Wong,et al.  Minimality Attack in Privacy Preserving Data Publishing , 2007, VLDB.

[16]  M. Omair Ahmad,et al.  A novel normalization technique for multimodal biometric systems , 2015, 2015 IEEE 58th International Midwest Symposium on Circuits and Systems (MWSCAS).

[17]  Keke Gai,et al.  Privacy-Aware Adaptive Data Encryption Strategy of Big Data in Cloud Computing , 2016, 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud).

[18]  Jun Tang,et al.  Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12 , 2017, ArXiv.

[19]  Jian Weng,et al.  AutoPrivacy: Automatic privacy protection and tagging suggestion for mobile social photo , 2018, Comput. Secur..

[20]  Jin Li,et al.  Privacy-preserving outsourced classification in cloud computing , 2017, Cluster Computing.

[21]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[22]  Yin Yang,et al.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy , 2016, CCS.

[23]  Keke Chen,et al.  Under Consideration for Publication in Knowledge and Information Systems Geometric Data Perturbation for Privacy Preserving Outsourced Data Mining , 2010 .

[24]  Buqing Cao,et al.  Scheduling workflows with privacy protection constraints for big data applications on cloud , 2020, Future Gener. Comput. Syst..

[25]  Jun Luo,et al.  An effective value swapping method for privacy preserving data publishing , 2016, Secur. Commun. Networks.

[26]  Edgar R. Weippl,et al.  Security Challenges in Cyber-Physical Production Systems , 2018, SWQD.

[27]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[28]  Mohammad Abdur Razzaque,et al.  A comprehensive review on privacy preserving data mining , 2015, SpringerPlus.

[29]  A. Tamilarasi,et al.  Privacy Preserving Data Mining Based on , 2010 .

[30]  Jared M. Maruskin Essential Linear Algebra , 2012 .

[31]  Alfredo Cuzzocrea Privacy-Preserving Big Data Management: The Case of OLAP , 2015, Big Data - Algorithms, Analytics, and Applications.

[32]  D. C. Howell Fundamental Statistics for the Behavioral Sciences , 1985 .

[33]  Peter A. Rosen,et al.  Protecting Data through Perturbation Techniques: The Impact on Knowledge Discovery in Databases , 2003, J. Database Manag..

[34]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[35]  Gunasekaran Manogaran,et al.  Big Data Knowledge System in Healthcare , 2017 .

[36]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[37]  Huw Jones Computer graphics through key mathematics , 2001 .

[38]  Matjaz Perc,et al.  Information cascades in complex networks , 2017, J. Complex Networks.

[39]  Erhard Rahm,et al.  Privacy-Preserving Record Linkage for Big Data: Current Approaches and Research Challenges , 2017, Handbook of Big Data Technologies.

[40]  Vicenç Torra Fuzzy microaggregation for the transparency principle , 2017, J. Appl. Log..

[41]  Claudio Bettini,et al.  Privacy protection in pervasive systems: State of the art and technical challenges , 2015, Pervasive Mob. Comput..

[42]  Vicen Torra,et al.  Data Privacy: Foundations, New Developments and the Big Data Challenge , 2017 .

[43]  K. J. Ray Liu,et al.  Privacy or Utility in Data Collection? A Contract Theoretic Approach , 2015, IEEE Journal of Selected Topics in Signal Processing.

[44]  Viswanath Venkatesh,et al.  Big data initiatives in retail environments: Linking service process perceptions to shopping outcomes , 2018, Ann. Oper. Res..

[45]  Huseyin Polat,et al.  A survey: deriving private information from perturbed data , 2015, Artificial Intelligence Review.

[46]  Rathindra Sarathy,et al.  A General Additive Data Perturbation Method for Database Security , 1999 .