DP-QIC: A differential privacy scheme based on quasi-identifier classification for big data publication

With the advent of the era of big data, data privacy protection has become a valuable topic in the field of data publication. Unfortunately, traditional methods of privacy protection, k-anonymity, and its extensions are not absolutely secure as an adversary with background knowledge can determine the owner of a record. The emergence of differential privacy provides a reasonable alternative for privacy security, but the existing solutions ignore the correlation between sensitive attributes and other attributes. In this paper, we propose a new differential privacy scheme based on quasi-identifier classification for big data publication (DP-QIC). It is a new data publishing scheme based on the obfuscation of attribute correlation. We innovatively present quasi-identifier classification based on sensitive attributes and the privacy ratio for evaluating the data set vulnerability. DP-QIC achieves data privacy-protecting through four steps: data collection, grouping and shuffling, generalization, merging, and noise adding, which retains the overall statistical characteristics of the data set. Moreover, the exponential mechanism and the Laplace mechanism are integrated to ensure higher flexibility and a stronger level of privacy protection, so DP-QIC can be used for privacy processing of different data groups in future development. Finally, we have compared the performance of our scheme with the other two famous schemes in the industry. Experimental results demonstrate that DP-QIC has obvious advantages in data utility, privacy protection, and processing efficiency.

[1]  Samaher AlJanabi,et al.  Multi Objectives Optimization to Gas Flaring Reduction from Oil Production , 2019, Big Data and Networks Technologies.

[2]  S. H. Ali,et al.  Miner for OACCR: Case of medical data analysis in knowledge discovery , 2012, 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT).

[3]  Anmin Fu,et al.  Secure Collaborative Deep Learning Against GAN Attacks in the Internet of Things , 2021, IEEE Internet of Things Journal.

[4]  Changyou Zhang,et al.  Privacy-preserving governmental data publishing: A fog-computing-based differential privacy approach , 2019, Future Gener. Comput. Syst..

[5]  Samaher Al-Janabi,et al.  An Innovative synthesis of deep learning techniques (DCapsNet & DCOM) for generation electrical renewable energy from wind energy , 2020, Soft Computing.

[6]  Kim-Kwang Raymond Choo,et al.  Fault-Tolerant Multisubset Aggregation Scheme for Smart Grid , 2021, IEEE Transactions on Industrial Informatics.

[7]  David K. Y. Yau,et al.  Cost of differential privacy in demand reporting for smart grid economic dispatch , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[8]  Josep Domingo-Ferrer,et al.  Individual Differential Privacy: A Utility-Preserving Formulation of Differential Privacy Guarantees , 2016, IEEE Transactions on Information Forensics and Security.

[9]  Huaqun Wang,et al.  Privacy-Preserving Federated Learning in Fog Computing , 2020, IEEE Internet of Things Journal.

[10]  Paul Rimba,et al.  Data-Driven Cybersecurity Incident Prediction: A Survey , 2019, IEEE Communications Surveys & Tutorials.

[11]  Qing-Long Han,et al.  Data-Driven Cyber Security in Perspective—Intelligent Traffic Analysis , 2020, IEEE Transactions on Cybernetics.

[12]  Guomin Yang,et al.  Efficient Certificateless Multi-Copy Integrity Auditing Scheme Supporting Data Dynamics , 2022, IEEE Transactions on Dependable and Secure Computing.

[13]  Sotiris Ioannidis,et al.  Please Forget Where I Was Last Summer: The Privacy Risks of Public Location (Meta)Data , 2019, NDSS.

[14]  Christoph Meinel,et al.  Clustering Heuristics for Efficient t-closeness Anonymisation , 2017, DEXA.

[15]  Catuscia Palamidessi,et al.  Geo-indistinguishability: differential privacy for location-based systems , 2012, CCS.

[16]  Josep Domingo-Ferrer,et al.  t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[17]  Anmin Fu,et al.  DIPOR: An IDA-based dynamic proof of retrievability scheme for cloud storage systems , 2018, J. Netw. Comput. Appl..

[18]  Samaher Al-Janabi,et al.  A new method for prediction of air pollution based on intelligent computation , 2020, Soft Comput..

[19]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[20]  Jiming Chen,et al.  Privacy and performance trade-off in cyber-physical systems , 2016, IEEE Network.

[21]  Anton Spivak,et al.  Sensor data anonymization based on genetic algorithm clustering with L-Diversity , 2016, 2016 18th Conference of Open Innovations Association and Seminar on Information Security and Protection of Information Technology (FRUCT-ISPIT).

[22]  Yue Gao,et al.  Differentially private publication of general time-serial trajectory data , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[23]  Shui Yu,et al.  Big data set privacy preserving through sensitive attribute-based grouping , 2017, 2017 IEEE International Conference on Communications (ICC).

[24]  Mohsen Guizani,et al.  Securing Cognitive Radio Networks against Primary User Emulation Attacks , 2016, IEEE Network.

[25]  Jingyu Hua,et al.  Privacy-Preserving Utility Verification of the Data Published by Non-Interactive Differentially Private Mechanisms , 2016, IEEE Transactions on Information Forensics and Security.

[26]  Xinwen Fu,et al.  Protection of query privacy for continuous location based services , 2011, 2011 Proceedings IEEE INFOCOM.

[27]  Benjamin Livshits,et al.  BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model , 2017, USENIX Security Symposium.

[28]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[29]  Samaher AlJanabi,et al.  Smart system to create an optimal higher education environment using IDA and IOTs , 2018, International Journal of Computers and Applications.

[30]  Samaher Al-Janabi,et al.  Evaluation prediction techniques to achievement an optimal biomedical analysis , 2019 .

[31]  Teng Wang,et al.  Survey on Improving Data Utility in Differentially Private Sequential Data Publishing , 2017, IEEE Transactions on Big Data.

[32]  Yining Liu,et al.  Lightweight Privacy-Preserving Raw Data Publishing Scheme , 2021, IEEE Transactions on Emerging Topics in Computing.

[33]  Domingo-FerrerJosep,et al.  t-Closeness through Microaggregation , 2015 .

[34]  Liehuang Zhu,et al.  Achieving differential privacy of trajectory data publishing in participatory sensing , 2017, Inf. Sci..

[35]  Samaher Al-Janabi,et al.  A nifty collaborative analysis to predicting a novel tool (DRFLLS) for missing values estimation , 2019, Soft Computing.

[36]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[37]  Yi Mu,et al.  Cloud-Based Outsourcing for Enabling Privacy-Preserving Large-Scale Non-Negative Matrix Factorization , 2019, IEEE Transactions on Services Computing.

[38]  Anmin Fu,et al.  VFL: A Verifiable Federated Learning With Privacy-Preserving for Big Data in Industrial IoT , 2020, IEEE Transactions on Industrial Informatics.

[39]  Anmin Fu,et al.  RNN-DP: A new differential privacy scheme base on Recurrent Neural Network for Dynamic trajectory privacy protection , 2020, J. Netw. Comput. Appl..

[40]  Anmin Fu,et al.  AQ-DP: A New Differential Privacy Scheme Based on Quasi-Identifier Classifying in Big Data , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).