Privacy Problems with Anonymized Transaction Databases

In this paper we consider privacy problems with anonymized transaction databases, i.e., transaction databases where the items are renamed in order to hide sensitive information. In particular, we show how an anonymized transaction database can be deanonymized using non-anonymized frequent itemsets. We describe how the problem can be formulated as an integer programming task, study the computational complexity of the problem, discuss how the computations could be done more efficiently in practice and experimentally examine the feasibility of the proposed approach.

[1]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[2]  M. Padberg Linear Optimization and Extensions , 1995 .

[3]  Jacobo Torán,et al.  On the hardness of graph isomorphism , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[4]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[5]  Alexander Martin,et al.  General Mixed Integer Programming: Computational Issues for Branch-and-Cut Algorithms , 2000, Computational Combinatorial Optimization.

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  Toon Calders Computational complexity of itemset frequency satisfiability , 2004, PODS '04.

[8]  Yücel Saygin,et al.  Secure Association Rule Sharing , 2004, PAKDD.

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[10]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[11]  Stanley Robson de Medeiros Oliveira,et al.  Privacy preserving frequent itemset mining , 2002 .

[12]  Chris Clifton,et al.  Using unknowns to prevent discovery of association rules , 2001, SGMD.

[13]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Donald L. Kreher,et al.  Combinatorial algorithms: generation, enumeration, and search , 1998, SIGA.

[15]  Sushil Jajodia,et al.  The inference problem: a survey , 2002, SKDD.

[16]  Xindong Wu,et al.  Proceedings, Third IEEE International Conference on Data Mining, ICDM 2003, 19-22 November 2003, Melbourne, Florida , 2003 .

[17]  Osmar R. Zaïane,et al.  Protecting sensitive knowledge by data sanitization , 2003, Third IEEE International Conference on Data Mining.

[18]  Toon Calders,et al.  Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[19]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[20]  Taneli Mielikäinen,et al.  On Inverse Frequent Set Mining , 2003 .

[21]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .