k-NN as an implementation of situation testing for discrimination discovery and prevention

With the support of the legally-grounded methodology of situation testing, we tackle the problems of discrimination discovery and prevention from a dataset of historical decisions by adopting a variant of k-NN classification. A tuple is labeled as discriminated if we can observe a significant difference of treatment among its neighbors belonging to a protected-by-law group and its neighbors not belonging to it. Discrimination discovery boils down to extracting a classification model from the labeled tuples. Discrimination prevention is tackled by changing the decision value for tuples labeled as discriminated before training a classifier. The approach of this paper overcomes legal weaknesses and technical limitations of existing proposals.

[1]  Natan Lerner,et al.  Group Rights and Discrimination in International Law , 1990 .

[2]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[3]  Franco Turini,et al.  DCUBE: discrimination discovery in databases , 2010, SIGMOD Conference.

[4]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[5]  T. Sowell Affirmative Action Around the World: An Empirical Study , 2004 .

[6]  Paul F. White,et al.  Approaches for Dealing with Small Sample Sizes in Employment Discrimination Litigation , 1999 .

[7]  J. Davenport Editor , 1960 .

[8]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[9]  W. M. Rodgers,et al.  Handbook on the Economics of Discrimination , 2009 .

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  Evelyn Ellis,et al.  EU Anti-Discrimination Law , 2005 .

[12]  Ralph J. Rohner Equal Credit Opportunity Act , 1979 .

[13]  J. Fleiss,et al.  Statistical methods for rates and proportions , 1973 .

[14]  Franco Turini,et al.  Data mining for discrimination discovery , 2010, TKDD.

[15]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[16]  R. Newcombe,et al.  Interval estimation for the difference between independent proportions: comparison of eleven methods. , 1998, Statistics in medicine.

[17]  Marc Bendick,et al.  Situation Testing for Employment Discrimination in the United States of America , 2007 .

[18]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[19]  Gary S. Becker,et al.  The Economics of Discrimination. , 1972 .

[20]  Joseph L. Gastwirth,et al.  Statistical Reasoning in the Legal Setting , 1992 .