A study of top-k measures for discrimination discovery

Data mining approaches for discrimination discovery unveil contexts of possible discrimination against protected-by-law groups by extracting classification rules from a dataset of historical decision records. Rules are ranked according to some legally-grounded contrast measure defined over a 4-fold contingency table, including risk difference, risk ratio, odds ratio, and a few others. Due to time and cost constraints, however, only the top-k ranked rules are taken into further consideration by an anti-discrimination analyst. In this paper, we study to what extent the sets of top-k ranked rules with respect to any two pairs of measures agree.