Discrimination- and privacy-aware patterns

Data mining is gaining societal momentum due to the ever increasing availability of large amounts of human data, easily collected by a variety of sensing technologies. We are therefore faced with unprecedented opportunities and risks: a deeper understanding of human behavior and how our society works is darkened by a greater chance of privacy intrusion and unfair discrimination based on the extracted patterns and profiles. Consider the case when a set of patterns extracted from the personal data of a population of individual persons is released for a subsequent use into a decision making process, such as, e.g., granting or denying credit. First, the set of patterns may reveal sensitive information about individual persons in the training population and, second, decision rules based on such patterns may lead to unfair discrimination, depending on what is represented in the training cases. Although methods independently addressing privacy or discrimination in data mining have been proposed in the literature, in this context we argue that privacy and discrimination risks should be tackled together, and we present a methodology for doing so while publishing frequent pattern mining results. We describe a set of pattern sanitization methods, one for each discrimination measure used in the legal literature, to achieve a fair publishing of frequent patterns in combination with two possible privacy transformations: one based on $$k$$k-anonymity and one based on differential privacy. Our proposed pattern sanitization methods based on $$k$$k-anonymity yield both privacy- and discrimination-protected patterns, while introducing reasonable (controlled) pattern distortion. Moreover, they obtain a better trade-off between protection and data quality than the sanitization methods based on differential privacy. Finally, the effectiveness of our proposals is assessed by extensive experiments.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  P. Greenwood,et al.  A Guide to Chi-Squared Testing , 1996 .

[3]  Adam D. Smith,et al.  Discovering frequent patterns in sensitive data , 2010, KDD.

[4]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[5]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[6]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[7]  Chris Clifton,et al.  Differential identifiability , 2012, KDD.

[8]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[9]  Ninghui Li,et al.  PrivBasis: Frequent Itemset Mining with Differential Privacy , 2012, Proc. VLDB Endow..

[10]  Franco Turini,et al.  Measuring Discrimination in Socially-Sensitive Decision Records , 2009, SDM.

[11]  Philip S. Yu,et al.  Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques , 2010 .

[12]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[13]  Franco Turini,et al.  Discrimination-aware data mining , 2008, KDD.

[14]  Toon Calders,et al.  Discrimination Aware Decision Tree Learning , 2010, 2010 IEEE International Conference on Data Mining.

[15]  Josep Domingo-Ferrer,et al.  Rule Protection for Indirect Discrimination Prevention in Data Mining , 2011, MDAI.

[16]  Luca Bonomi,et al.  Mining Frequent Patterns with Differential Privacy , 2013, Proc. VLDB Endow..

[17]  Josep Domingo-Ferrer,et al.  Injecting Discrimination and Privacy Awareness Into Pattern Discovery , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[18]  Josep Domingo-Ferrer,et al.  Sensitivity-Independent differential Privacy via Prior Knowledge Refinement , 2012, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[19]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[20]  Josep Domingo-Ferrer,et al.  A Study on the Impact of Data Anonymization on Anti-discrimination , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[21]  Franco Turini,et al.  Data mining for discrimination discovery , 2010, TKDD.

[22]  Franco Turini,et al.  The Discovery of Discrimination , 2013, Discrimination and Privacy in the Information Society.

[23]  Bettina Berendt,et al.  Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence , 2014, Artificial Intelligence and Law.

[24]  Dino Pedreschi,et al.  Anonymity preserving pattern discovery , 2008, The VLDB Journal.

[25]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[26]  M. Phil,et al.  A METHODOLOGY FOR DIRECT AND INDIRECT DISCRIMINATION PREVENTION IN DATA MINING , 2015 .

[27]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[28]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[29]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control , 2012 .

[30]  Jeffrey F. Naughton,et al.  On differentially private frequent itemset mining , 2012, Proc. VLDB Endow..

[31]  Xiangliang Zhang,et al.  Decision Theory for Discrimination-Aware Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[32]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[33]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[34]  Toon Calders,et al.  Handling Conditional Discrimination , 2011, 2011 IEEE 11th International Conference on Data Mining.

[35]  Toon Calders,et al.  Non-derivable itemset mining , 2007, Data Mining and Knowledge Discovery.

[36]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[37]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[38]  Terry L King A Guide to Chi-Squared Testing , 1997 .

[39]  Josep Domingo-Ferrer,et al.  A Methodology for Direct and Indirect Discrimination Prevention in Data Mining , 2013, IEEE Transactions on Knowledge and Data Engineering.

[40]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[41]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control: Hundepool/Statistical Disclosure Control , 2012 .

[42]  Ran Wolff,et al.  The VLDB Journal manuscript No. (will be inserted by the editor) Providing k-Anonymity in Data Mining , 2022 .

[43]  KamiranFaisal,et al.  Data preprocessing techniques for classification without discrimination , 2012 .

[44]  Chris Clifton,et al.  When do data mining results violate privacy? , 2004, KDD.

[45]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[46]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[47]  Franco Turini,et al.  Integrating induction and deduction for finding evidence of discrimination , 2009, Artificial Intelligence and Law.

[48]  Johannes Gehrke,et al.  Crowd-Blending Privacy , 2012, IACR Cryptol. ePrint Arch..

[49]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[50]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[51]  Toon Calders,et al.  Discrimination and Privacy in the Information Society - Data Mining and Profiling in Large Databases , 2012, Discrimination and Privacy in the Information Society.

[52]  Faisal Kamiran,et al.  Quantifying explainable discrimination and removing illegal discrimination in automated decision making , 2012, Knowledge and Information Systems.

[53]  Franco Turini,et al.  k-NN as an implementation of situation testing for discrimination discovery and prevention , 2011, KDD.

[54]  Josep Domingo-Ferrer,et al.  Generalization-based privacy preservation and discrimination prevention in data publishing and mining , 2014, Data Mining and Knowledge Discovery.

[55]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.