Hiding sensitive knowledge without side effects

Sensitive knowledge hiding in large transactional databases is one of the major goals of privacy preserving data mining. However, it is only recently that researchers were able to identify exact solutions for the hiding of knowledge, depicted in the form of sensitive frequent itemsets and their related association rules. Exact solutions allow for the hiding of vulnerable knowledge without any critical compromises, such as the hiding of nonsensitive patterns or the accidental uncovering of infrequent itemsets, amongst the frequent ones, in the sanitized outcome. In this paper, we highlight the process of border revision, which plays a significant role towards the identification of exact hiding solutions, and we provide efficient algorithms for the computation of the revised borders. Furthermore, we review two algorithms that identify exact hiding solutions, and we extend the functionality of one of them to effectively identify exact solutions for a wider range of problems (than its original counterpart). Following that, we introduce a novel framework for decomposition and parallel solving of hiding problems, which are handled by each of these approaches. This framework improves to a substantial degree the size of the problems that both algorithms can handle and significantly decreases their runtime. Through experimentation, we demonstrate the effectiveness of these approaches toward providing high quality knowledge hiding solutions.

[1]  Yannis Theodoridis,et al.  A quantitative and qualitative ANALYSIS of blocking in association rule hiding , 2004, WPES '04.

[2]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[3]  Carla E. Brodley,et al.  KDD-Cup 2000 organizers' report: peeling the onion , 2000, SKDD.

[4]  Jie Wang,et al.  Knowledge and Information Systems REGULAR PAPER , 2006 .

[5]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[6]  Qi Wang,et al.  Random-data perturbation techniques and privacy-preserving data mining , 2005, Knowledge and Information Systems.

[7]  George V. Moustakides,et al.  A Max-Min Approach for Hiding Frequent Itemsets , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[8]  Chris Clifton,et al.  Using unknowns to prevent discovery of association rules , 2001, SGMD.

[9]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[10]  Stephen E. Fienberg,et al.  Preserving the Confidentiality of Categorical Statistical Data Bases When Releasing Information for Association Rules* , 2005, Data Mining and Knowledge Discovery.

[11]  Elisa Bertino,et al.  A Framework for Evaluating Privacy Preserving Data Mining Algorithms* , 2005, Data Mining and Knowledge Discovery.

[12]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Aris Gkoulalas-Divanis,et al.  A Hybrid Approach to Frequent Itemset Hiding , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[14]  Wesley W. Chu,et al.  A Pattern Decomposition Algorithm for Data Mining of Frequent Patterns , 2002, Knowledge and Information Systems.

[15]  Arbee L. P. Chen,et al.  Efficient Graph-Based Algorithms for Discovering and Maintaining Association Rules in Large Databases , 2001, Knowledge and Information Systems.

[16]  Osmar R. Zaïane,et al.  Fast parallel association rule mining without candidacy generation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[17]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[18]  Stanley Robson de Medeiros Oliveira,et al.  Privacy preserving frequent itemset mining , 2002 .

[19]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[20]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[21]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[22]  Aris Gkoulalas-Divanis,et al.  An integer programming approach for frequent itemset hiding , 2006, CIKM '06.

[23]  Chris Clifton,et al.  SECURITY AND PRIVACY IMPLICATIONS OF DATA MINING , 1996 .

[24]  Osmar R. Zaïane,et al.  Protecting sensitive knowledge by data sanitization , 2003, Third IEEE International Conference on Data Mining.

[25]  Philip S. Yu,et al.  A border-based approach for hiding sensitive frequent itemsets , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[26]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[27]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[28]  Sushil Jajodia,et al.  The inference problem: a survey , 2002, SKDD.

[29]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[30]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[31]  Chris Clifton,et al.  Defining Privacy for Data Mining , 2002 .

[32]  Makoto Yokoo,et al.  The Distributed Constraint Satisfaction Problem: Formalization and Algorithms , 1998, IEEE Trans. Knowl. Data Eng..

[33]  Matthew Morgenstern,et al.  Controlling logical inference in multilevel database systems , 1988, Proceedings. 1988 IEEE Symposium on Security and Privacy.

[34]  David Wai-Lok Cheung,et al.  Effect of Data Skewness in Parallel Mining of Association Rules , 1998, PAKDD.

[35]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[36]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-memory Systems , 1998 .

[37]  Sumit Sarkar,et al.  Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns , 2005, Inf. Syst. Res..

[38]  Elisa Bertino,et al.  Hiding Association Rules by Using Confidence and Support , 2001, Information Hiding.

[39]  Masayuki Numao,et al.  Combining Burst Extraction Method and Sequence-Based SOM for Evaluation of Fracture Dynamics in Solid Oxide Fuel Cell , 2007 .