Using t-closeness anonymity to control for non-discrimination

We investigate the relation between t-closeness, a well-known model of data anonymization against attribute disclosure, and a-protection, a model of the social discrimination hidden in data. We show that t-closeness implies bdf(t)-protection, for a bound function bdf() depending on the discrimination measure f() at hand. This allows us to adapt inference control methods, such as the Mondrian multidimensional generalization technique and the Sabre bucketization and redistribution framework, to the purpose of non-discrimination data protection. The parallel between the two analytical models raises intriguing issues on the interplay between data anonymization and non-discrimination research in data protection.

[1]  Josep Domingo-Ferrer,et al.  A Study on the Impact of Data Anonymization on Anti-discrimination , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[2]  R. Kanter Some Effects of Proportions on Group Life: Skewed Sex Ratios and Responses to Token Women , 1977, American Journal of Sociology.

[3]  Franco Turini,et al.  Data mining for discrimination discovery , 2010, TKDD.

[4]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[5]  Josep Domingo-Ferrer,et al.  A Methodology for Direct and Indirect Discrimination Prevention in Data Mining , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[7]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[8]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control , 2012 .

[10]  Hao Yuan,et al.  On the Complexity of t-Closeness Anonymization and Related Problems , 2013, DASFAA.

[11]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[13]  Josep Domingo-Ferrer,et al.  A Survey of Inference Control Methods for Privacy-Preserving Data Mining , 2008, Privacy-Preserving Data Mining.

[14]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[15]  Panos Kalnis,et al.  SABRE: a Sensitive Attribute Bucketization and REdistribution framework for t-closeness , 2011, The VLDB Journal.

[16]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[17]  Foster J. Provost,et al.  A Survey of Methods for Scaling Up Inductive Algorithms , 1999, Data Mining and Knowledge Discovery.

[18]  Paul F. White,et al.  Approaches for Dealing with Small Sample Sizes in Employment Discrimination Litigation , 1999 .

[19]  Franco Turini,et al.  Integrating induction and deduction for finding evidence of discrimination , 2009, Artificial Intelligence and Law.

[20]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[21]  Salvatore Ruggieri,et al.  YaDT: yet another decision tree builder , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[22]  Josep Domingo-Ferrer,et al.  Injecting Discrimination and Privacy Awareness Into Pattern Discovery , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[23]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[24]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[25]  Ashwin Machanavajjhala,et al.  Privacy-Preserving Data Publishing , 2009, Found. Trends Databases.

[26]  Franco Turini,et al.  A study of top-k measures for discrimination discovery , 2012, SAC '12.

[27]  Aris Gkoulalas-Divanis,et al.  A Survey of Association Rule Hiding Methods for Privacy , 2008, Privacy-Preserving Data Mining.

[28]  Dr B Santhosh Kumar Santhosh Balan,et al.  Closeness : A New Privacy Measure for Data Publishing , 2022 .

[29]  Toon Calders,et al.  Discrimination and Privacy in the Information Society - Data Mining and Profiling in Large Databases , 2012, Discrimination and Privacy in the Information Society.

[30]  Franco Turini,et al.  k-NN as an implementation of situation testing for discrimination discovery and prevention , 2011, KDD.

[31]  Josep Domingo-Ferrer,et al.  Generalization-based privacy preservation and discrimination prevention in data publishing and mining , 2014, Data Mining and Knowledge Discovery.

[32]  Toon Calders,et al.  Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures , 2013, Discrimination and Privacy in the Information Society.

[33]  Faisal Kamiran,et al.  Explainable and Non-explainable Discrimination in Classification , 2013, Discrimination and Privacy in the Information Society.

[34]  Salvatore Ruggieri,et al.  A multidisciplinary survey on discrimination analysis , 2013, The Knowledge Engineering Review.

[35]  P. Bickel,et al.  Sex Bias in Graduate Admissions: Data from Berkeley , 1975, Science.

[36]  Franco Turini,et al.  DCUBE: discrimination discovery in databases , 2010, SIGMOD Conference.