A Grid-Based Swarm Intelligence Algorithm for Privacy-Preserving Data Mining

Privacy-preserving data mining (PPDM) has become an interesting and emerging topic in recent years because it helps hide confidential information, while allowing useful knowledge to be discovered at the same time. Data sanitization is a common way to perturb a database, and thus sensitive or confidential information can be hidden. PPDM is not a trivial task and can be concerned an Non-deterministic Polynomial-time (NP)-hard problem. Many algorithms have been studied to derive optimal solutions using the evolutionary process, although most are based on straightforward or single-objective methods used to discover the candidate transactions/items for sanitization. In this paper, we present a multi-objective algorithm using a grid-based method (called GMPSO) to find optimal solutions as candidates for sanitization. The designed GMPSO uses two strategies for updating gbest and pbest during the evolutionary process. Moreover, the pre-large concept is adapted herein to speed up the evolutionary process, and thus multiple database scans during each evolutionary process can be reduced. From the designed GMPSO, multiple Pareto solutions rather than single-objective algorithms can be derived based on Pareto dominance. In addition, the side effects of the sanitization process can be significantly reduced. Experiments have shown that the designed GMPSO achieves better side effects than the previous single-objective algorithm and the NSGA-II-based approach, and the pre-large concept can also help with speeding up the computational cost compared to the NSGA-II-based algorithm.

[1]  C. Fonseca,et al.  GENETIC ALGORITHMS FOR MULTI-OBJECTIVE OPTIMIZATION: FORMULATION, DISCUSSION, AND GENERALIZATION , 1993 .

[2]  Lu Yang,et al.  Mining of skyline patterns by considering both frequent and utility constraints , 2019, Eng. Appl. Artif. Intell..

[3]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[4]  Yun Sing Koh,et al.  A Survey of Sequential Pattern Mining , 2017 .

[5]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[6]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[7]  Marco Dorigo,et al.  Distributed Optimization by Ant Colonies , 1992 .

[8]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[9]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[10]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[11]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[12]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[13]  Tzung-Pei Hong,et al.  A Greedy-based Approach for Hiding Sensitive Itemsets by Transaction Insertion , 2013, J. Inf. Hiding Multim. Signal Process..

[14]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[15]  Justin Zhijun Zhan,et al.  An ACO-based approach to mine high-utility itemsets , 2017, Knowl. Based Syst..

[16]  Mu-En Wu,et al.  An Effective Approach for Obtaining a Group Trading Strategy Portfolio Using Grouping Genetic Algorithm , 2019, IEEE Access.

[17]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[18]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[19]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[20]  Hamido Fujita,et al.  A survey of incremental high‐utility itemset mining , 2018, WIREs Data Mining Knowl. Discov..

[21]  Arbee L. P. Chen,et al.  Hiding Sensitive Association Rules with Limited Side Effects , 2007 .

[22]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[23]  C.A. Coello Coello,et al.  MOPSO: a proposal for multiple objective particle swarm optimization , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[24]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[25]  Jeng-Shyang Pan,et al.  Association Rule Hiding Based on Evolutionary Multi-Objective Optimization by Removing Items , 2014, AAAI.

[26]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[27]  David Corne,et al.  The Pareto archived evolution strategy: a new baseline algorithm for Pareto multiobjective optimisation , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[28]  Tzung-Pei Hong,et al.  Using TF-IDF to hide sensitive itemsets , 2012, Applied Intelligence.

[29]  Dervis Karaboga,et al.  A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm , 2007, J. Glob. Optim..

[30]  Stanley Robson de Medeiros Oliveira,et al.  Privacy preserving frequent itemset mining , 2002 .

[31]  Fang Liu,et al.  A Clustering k-Anonymity Privacy-Preserving Method for Wearable IoT Devices , 2018, Secur. Commun. Networks.

[32]  Tzung-Pei Hong,et al.  The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion , 2014, Applied Intelligence.

[33]  Tzung-Pei Hong,et al.  The Pre-FUFP algorithm for incremental mining , 2009, Expert Syst. Appl..

[34]  S. Baskar,et al.  Solving multiobjective optimal reactive power dispatch using modified NSGA-II , 2011 .

[35]  Shengrui Wang,et al.  A New Approach to Privacy-Preserving Multiple Independent Data Publishing , 2018 .

[36]  Antonio Gomariz,et al.  The SPMF Open-Source Data Mining Library Version 2 , 2016, ECML/PKDD.

[37]  Md Zahidul Islam,et al.  Privacy preserving data mining: A noise addition framework using a novel clustering technique , 2011, Knowl. Based Syst..

[38]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[39]  Elisa Bertino,et al.  Hiding Association Rules by Using Confidence and Support , 2001, Information Hiding.

[40]  Charu C. Aggarwal,et al.  On privacy preservation against adversarial data mining , 2006, KDD '06.

[41]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[42]  Philippe Fournier-Viger,et al.  Hiding sensitive itemsets with multiple objective optimization , 2019, Soft Computing.

[43]  Tzung-Pei Hong,et al.  Efficiently Hiding Sensitive Itemsets with Transaction Deletion Based on Genetic Algorithms , 2014, TheScientificWorldJournal.

[44]  Shuguo Han,et al.  Privacy-Preserving Genetic Algorithms for Rule Discovery , 2007, DaWaK.

[45]  Justin Zhijun Zhan,et al.  Data mining in distributed environment: a survey , 2017, WIREs Data Mining Knowl. Discov..

[46]  Jeng-Shyang Pan,et al.  A Clustering Scheme for Wireless Sensor Networks Based on Genetic Algorithm and Dominating Set , 2018 .

[47]  Tzung-Pei Hong,et al.  A new incremental data mining algorithm using pre-large itemsets , 2001, Intell. Data Anal..