A Study on the Impact of Data Anonymization on Anti-discrimination

In last years, data mining has raised some concerns related to privacy invasion of the individuals and potential discrimination based on the extracted patterns and profiles. Efforts at fighting against these risks have led to developing privacy preserving data mining (PPDM) techniques and anti-discrimination techniques in data mining. However, there is an evident gap between the large body of research in data privacy technologies and the recent early results on anti-discrimination technologies. This context presents a study on the relation between data anonymization from privacy technologies literature and anti-discrimination. We discuss how different data anonymization techniques have impact on discriminatory biased datasets.

[1]  Franco Turini,et al.  Discrimination-aware data mining , 2008, KDD.

[2]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[3]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control: Hundepool/Statistical Disclosure Control , 2012 .

[4]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Franco Turini,et al.  Data mining for discrimination discovery , 2010, TKDD.

[6]  Married,et al.  Classification with no discrimination by preferential sampling , 2010 .

[7]  Franco Turini,et al.  Measuring Discrimination in Socially-Sensitive Decision Records , 2009, SDM.

[8]  Franco Turini,et al.  k-NN as an implementation of situation testing for discrimination discovery and prevention , 2011, KDD.

[9]  Philip S. Yu,et al.  Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques , 2010 .

[10]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[11]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[12]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[13]  Toon Calders,et al.  Handling Conditional Discrimination , 2011, 2011 IEEE 11th International Conference on Data Mining.

[14]  Jörg Drechsler,et al.  Remote Data Access and the Risk of Disclosure from Linear Regression: An Empirical Study , 2010, Privacy in Statistical Databases.

[15]  Josep Domingo-Ferrer,et al.  A Methodology for Direct and Indirect Discrimination Prevention in Data Mining , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[17]  Josep Domingo-Ferrer,et al.  Rule Protection for Indirect Discrimination Prevention in Data Mining , 2011, MDAI.

[18]  Toon Calders,et al.  Classification Without Discrimination , 2009 .

[19]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[20]  Toon Calders,et al.  Discrimination Aware Decision Tree Learning , 2010, 2010 IEEE International Conference on Data Mining.

[21]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.