A Framework for Determining the Fairness of Outlier Detection

Outlier detection (OD) is a widely studied problem whose goal is to identify points from a data set that are considered anomalous. Among all methods used in AI and data science, OD is perhaps the most controversial as common applications such as credit card fraud, cyber-intrusion and terrorist activity all involve suggesting that someone is committing a serious crime. However, there is little work on fair outlier detection. We show how to determine if an outlier detection algorithm’s output is fair with respect to multiple protected status variables (PSVs) by formulating various combinatorial problems which attempt to find an explanation (using the PSVs) that differentiates the outlier group from the normal group. We argue that if there is no solution for these explanation problems, then the output of an algorithm can be considered fair, and give a probabilistic interpretation of our work. Since we prove that the underlying combinatorial problems are computationally intractable (i.e., NP-hard), our approaches cannot be efficiently gamed/side-stepped.

[1]  Franco Turini,et al.  k-NN as an implementation of situation testing for discrimination discovery and prevention , 2011, KDD.

[2]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[3]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[4]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[5]  Christian Sohler,et al.  Fair Coresets and Streaming Algorithms for Fair k-Means Clustering , 2018, ArXiv.

[6]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[7]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[8]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[9]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[10]  Kenneth H. Rosen,et al.  Discrete Mathematics and its applications , 2000 .

[11]  S. S. Ravi,et al.  Making Existing Clusterings Fairer: Algorithms, Complexity Results and Insights , 2020, AAAI.

[12]  Pranjal Awasthi,et al.  Fair k-Center Clustering for Data Summarization , 2019, ICML.

[13]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[14]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[15]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[16]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[17]  Peter J. Rousseeuw,et al.  ISODEPTH: A Program for Depth Contours , 1996 .

[18]  Alexandra Chouldechova,et al.  The Frontiers of Fairness in Machine Learning , 2018, ArXiv.

[19]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[20]  W. Gasarch,et al.  Invitation to Fixed-Parameter Algorithms: Parameterized Complexity Theory: Parameterized Algorithmics: Theory, Practice and Prospects , 2008, Comput. J..

[21]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[22]  Fabrizio Angiulli,et al.  Outlier Detection Techniques for Data Mining , 2009, Encyclopedia of Data Warehousing and Mining.