Reducing Side Effects of Hiding Sensitive Itemsets in Privacy Preserving Data Mining

Data mining is traditionally adopted to retrieve and analyze knowledge from large amounts of data. Private or confidential data may be sanitized or suppressed before it is shared or published in public. Privacy preserving data mining (PPDM) has thus become an important issue in recent years. The most general way of PPDM is to sanitize the database to hide the sensitive information. In this paper, a novel hiding-missing-artificial utility (HMAU) algorithm is proposed to hide sensitive itemsets through transaction deletion. The transaction with the maximal ratio of sensitive to nonsensitive one is thus selected to be entirely deleted. Three side effects of hiding failures, missing itemsets, and artificial itemsets are considered to evaluate whether the transactions are required to be deleted for hiding sensitive itemsets. Three weights are also assigned as the importance to three factors, which can be set according to the requirement of users. Experiments are then conducted to show the performance of the proposed algorithm in execution time, number of deleted transactions, and number of side effects.

[1]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[2]  Ali Amiri,et al.  Dare to share: Protecting sensitive knowledge with data sanitization , 2007, Decis. Support Syst..

[3]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[4]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[5]  Tzung-Pei Hong,et al.  Discovery of high utility itemsets from on-shelf time periods of products , 2011, Expert Syst. Appl..

[6]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[7]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[8]  Arbee L. P. Chen,et al.  Hiding Sensitive Association Rules with Limited Side Effects , 2007, IEEE Transactions on Knowledge and Data Engineering.

[9]  Bi-Ru Dai,et al.  Hiding Frequent Patterns in the Updated Database , 2010, 2010 International Conference on Information Science and Applications.

[10]  K. Duraiswamy,et al.  Advanced Approach in Sensitive Rule Hiding , 2009 .

[11]  Tzung-Pei Hong,et al.  Using TF-IDF to hide sensitive itemsets , 2012, Applied Intelligence.

[12]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[13]  Tzung-Pei Hong,et al.  An effective tree structure for mining high utility itemsets , 2011, Expert Syst. Appl..

[14]  Osmar R. Zaïane,et al.  Protecting sensitive knowledge by data sanitization , 2003, Third IEEE International Conference on Data Mining.

[15]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[16]  Suraj P. Patil,et al.  A novel approach for efficient mining and hiding of sensitive association rule , 2012, 2012 Nirma University International Conference on Engineering (NUiCONE).

[17]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[18]  Elisa Bertino,et al.  Hiding Association Rules by Using Confidence and Support , 2001, Information Hiding.

[19]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[20]  Tzung-Pei Hong,et al.  A lattice-based data sanitization approach , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[21]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[22]  Tzung-Pei Hong,et al.  Incrementally fast updated frequent pattern trees , 2008, Expert Syst. Appl..

[23]  Yin-Fu Huang,et al.  Privacy Preserving Association Rules by Using Greedy Approach , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[24]  Tzung-Pei Hong,et al.  The Pre-FUFP algorithm for incremental mining , 2009, Expert Syst. Appl..

[25]  Daniel E. O'Leary,et al.  Knowledge Discovery as a Threat to Database Security , 1991, Knowledge Discovery in Databases.

[26]  Tzung-Pei Hong,et al.  A Greedy-based Approach for Hiding Sensitive Itemsets by Transaction Insertion , 2013, J. Inf. Hiding Multim. Signal Process..

[27]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[28]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[29]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[30]  Tzung-Pei Hong,et al.  Hiding sensitive itemsets by inserting dummy transactions , 2011, 2011 IEEE International Conference on Granular Computing.

[31]  Aris Gkoulalas-Divanis,et al.  Exact Knowledge Hiding through Database Extension , 2009, IEEE Transactions on Knowledge and Data Engineering.

[32]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[33]  Tzung-Pei Hong,et al.  An incremental mining algorithm for high utility itemsets , 2012, Expert Syst. Appl..