Mining Association Rules under Privacy Constraints

Data mining services require accurate input data for their results to be meaningful, but privacy concerns may impel users to provide spurious information. In this chapter, we study whether users can be encouraged to provide correct information by ensuring that the mining process cannot, with any reasonable degree of certainty, violate their privacy. Our analysis is in the context of extracting association rules from large historical databases, a popular mining process that identifies interesting correlations between database attributes. We analyze the various schemes that have been proposed for this purpose with regard to a variety of parameters including the degree of trust, privacy metric, model accuracy and mining efficiency.

[1]  George V. Moustakides,et al.  A Max-Min Approach for Hiding Frequent Itemsets , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[2]  Jayant R. Haritsa,et al.  On Addressing Efficiency Concerns in Privacy-Preserving Mining , 2003, DASFAA.

[3]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[4]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[5]  Arie Shoshani,et al.  Statistical Databases: Characteristics, Problems, and some Solutions , 1982, VLDB.

[6]  Yücel Saygin,et al.  Privacy preserving association rule mining , 2002, Proceedings Twelfth International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems RIDE-2EC 2002.

[7]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Chris Clifton,et al.  Using unknowns to prevent discovery of association rules , 2001, SGMD.

[10]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[11]  Vikram Pudi,et al.  Quantifying the Utility of the Past in Mining Large Databases , 2000, Inf. Syst..

[12]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[13]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Ramakrishnan Srikant,et al.  Hippocratic Databases , 2002, VLDB.

[15]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[16]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[17]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[18]  Rakesh Agrawal,et al.  Managing healthcare data hippocratically , 2004, ACM SIGMOD Conference.

[19]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[20]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[21]  Aris Gkoulalas-Divanis,et al.  An integer programming approach for frequent itemset hiding , 2006, CIKM '06.

[22]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[23]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[24]  David J. DeWitt,et al.  Limiting Disclosure in Hippocratic Databases , 2004, VLDB.

[25]  S. M. Samuels On the Number of Successes in Independent Trials , 1965 .

[26]  Jaideep Vaidya,et al.  Privacy preserving association rule mining in vertically partitioned data , 2002, KDD.

[27]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[28]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[29]  Nina Mishra,et al.  Privacy via pseudorandom sketches , 2006, PODS.

[30]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[31]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[32]  Wei Zhao,et al.  A New Scheme on Privacy Preserving Association Rule Mining , 2004, PKDD.

[33]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[34]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[35]  Mark S. Ackerman,et al.  Beyond Concern: Understanding Net Users' Attitudes About Online Privacy , 1999, ArXiv.

[36]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[37]  Dorothy E. Denning,et al.  Cryptography and Data Security , 1982 .

[38]  Elisa Bertino,et al.  Hiding Association Rules by Using Confidence and Support , 2001, Information Hiding.

[39]  Christos Faloutsos,et al.  Auditing Compliance with a Hippocratic Database , 2004, VLDB.