An integer programming approach for frequent itemset hiding

The rapid growth of transactional data brought, soon enough, into attention the need of its further exploitation. In this paper, we investigate the problem of securing sensitive knowledge from being exposed in patterns extracted during association rule mining. Instead of hiding the produced rules directly, we decide to hide the sensitive frequent itemsets that may lead to the production of these rules. As a first step, we introduce the notion of distance between two databases and a measure for quantifying it. By trying to minimize the distance between the original database and its sanitized version (that can safely be released), we propose a novel, exact algorithm for association rule hiding and evaluate it on real world datasets demonstrating its effectiveness towards solving the problem.

[1]  Sumit Sarkar,et al.  Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns , 2005, Inf. Syst. Res..

[2]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[3]  Carla E. Brodley,et al.  KDD-Cup 2000 organizers' report: peeling the onion , 2000, SKDD.

[4]  Chris Clifton,et al.  Defining Privacy for Data Mining , 2002 .

[5]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[6]  Yannis Theodoridis,et al.  A quantitative and qualitative ANALYSIS of blocking in association rule hiding , 2004, WPES '04.

[7]  Chris Clifton,et al.  Using unknowns to prevent discovery of association rules , 2001, SGMD.

[8]  Stephen E. Fienberg,et al.  Preserving the Confidentiality of Categorical Statistical Data Bases When Releasing Information for Association Rules* , 2005, Data Mining and Knowledge Discovery.

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Christian Prins,et al.  Applications of optimisation with Xpress-MP , 2002 .

[11]  Philip S. Yu,et al.  A border-based approach for hiding sensitive frequent itemsets , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[13]  E. Polak Introduction to linear and nonlinear programming , 1973 .

[14]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[15]  Jeanine Weekes Schroer,et al.  The Finite String Newsletter Abstracts of Current Literature Glisp User's Manual , 2022 .

[16]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[17]  Sushil Jajodia,et al.  The inference problem: a survey , 2002, SKDD.

[18]  Stanley Robson de Medeiros Oliveira,et al.  Privacy preserving frequent itemset mining , 2002 .

[19]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[20]  Chris Clifton,et al.  SECURITY AND PRIVACY IMPLICATIONS OF DATA MINING , 1996 .