The Case for Process Fairness in Learning: Feature Selection for Fair Decision Making

Machine learning methods are increasingly being used to inform, or sometimes even directly to make, important decisions about humans. A number of recent works have focussed on the fairness of the outcomes of such decisions, particularly on avoiding decisions that affect users of different sensitive groups (e.g., race, gender) disparately. In this paper, we propose to consider the fairness of the process of decision making. Process fairness can be measured by estimating the degree to which people consider various features to be fair to use when making an important legal decision. We examine the task of predicting whether or not a prisoner is likely to commit a crime again once released by analyzing the dataset considered by ProPublica relating to the COMPAS system. We introduce new measures of people’s discomfort with using various features, show how these measures can be estimated, and consider the effect of removing the uncomfortable features on prediction accuracy and on outcome fairness. Our empirical analysis suggests that process fairness may be achieved with little cost to outcome fairness, but that some loss of accuracy is unavoidable. 1 Motivation and New Measures of Process Fairness As machine learning methods are increasingly being used in decision making scenarios that affect human lives, there is a growing concern about the fairness of such decision making. These concerns have spawned a flurry of recent research activity into learning methods for detecting and avoiding unfairness in decision making [8, 9, 12, 15, 16, 18, 19]. Our focus in this paper is on the foundational notions of fairness that underlie these fair learning and unfairness detection methods. We argue that the notions of fairness underlying much of the prior work are centered around the outcomes of the decision process. They are inspired in large part by the application of anti-discrimination laws in various countries [2], under which decision policies or practices (implemented by humans) can be declared as discriminatory based on their effects on people belonging to certain sensitive demographic groups (e.g., gender, race). For instance, the notions of “individual fairness” in [8], “situational testing” in [15], and “disparate treatment” in [18], consider individuals who belong to different sensitive groups, yet share similar non-sensitive features (qualifications), and require them to receive similar decision outcomes. Similarly, the notions of “group fairness” in [19] and “disparate impact” in [18] are based on different sensitive groups (e.g., males and females or blacks and whites) receiving beneficial decision outcomes in similar proportions. Finally, the notion of “disparate mistreatment” in [17] is rooted in the desire for different sensitive demographic groups to Symposium on Machine Learning and the Law at the 29th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. experience similar rates of errors in decision outcomes. Thus, in prior works, the fairness of decision making has been evaluated based on the decision outcomes. In this paper, we make the case for notions of fairness that are based on the process of decision making rather than on the outcomes. Our notions of process fairness are motivated by the observation that in many decision making scenarios, humans tend to have a moral sense for whether or not it is fair to use an input feature in the decision making process. For instance, consider the task of predicting recidivism risk for an offender. COMPAS is a commercial recidivism prediction tool that relies on a number of different types of user features, such as information about Criminal history, Family criminality, Work and Social environment of the offender. In a user survey that we conducted, we found that a strong majority of users felt that it was fair to use Criminal history, but unfair to use Family criminality. On the other hand, features Work and Social environment were deemed as fair and unfair (respectively) by only a weak majority of users. Such societal consensus (strong or weak) on the fairness of using a feature in a decision process may be rooted in prevailing cultural or social norms, or political beliefs or legal (privacy) regulations or historical precedents. Unfortunately, existing outcome-based fairness notions developed for learning systems fail to capture this intuitive human understanding of fairness. Instead, current fair learning mechanisms justify the means (process) by the ends (outcomes), ignoring the different levels of societal consensus on the desirability of using different features in decision making. We propose different notions of process fairness to redress this situation. 1.1 Defining process fairness Suppose a learning method, say a classifier C, has been trained to make decisions using a set of features F . Intuitively, the classifier’s decision process would be considered fair by a user u only if the user u judges the use of every one of the features in the set F to be fair. We leverage this intuition to define the process fairness of the classifier C to be the fraction of all users that consider the use of every one of the features in F to be fair. Our process fairness definition relies critically on users’ judgments about the use of individual features when making decisions. Note that a user’s judgment about a feature may change after they learn how using the feature might affect the decision outcomes. For instance, a user who initially considered a feature unfair for use in predicting recidivism risk might change their mind and deem the feature fair to use after learning that using the feature significantly improves the accuracy of prediction. Similarly, learning that using a feature might increase or decrease disparity in decision outcomes for different demographic groups (e.g., whites vs. blacks or men vs. women) might make a user change their opinion on the fairness of using that feature in decision making. To capture the above concepts, where we recognize that users’ judgments about using individual features in decision making might vary after they learn about impacts, we define three measures of process fairness: feature-apriori fairness, feature-accuracy fairness and feature-disparity fairness. Consider a scenario for making some important decision. Let U denote the set of all members (‘users’) of society, and F denote the set of all possible features that might be used in the decision making process. Feature-apriori fairness. For a given feature f ∈ F , let Uf ⊆ U denote the set of all users that consider the feature f fair to use without a priori knowledge of how its usage affects outcomes. Given a set of features F ′, let CF ′ denote the classifier that uses those features F ′. We define feature-apriori fairness of CF ′ := | ⋂ fi∈F ′ Ufi | |U| . (1) Feature-accuracy fairness. Let U f ⊆ U denote the set of all users that consider the feature f fair to use if it increases the accuracy of the classifier. Note that typically we expect Uf ⊆ U f , though this need not always hold exactly (either due to noise in estimating user preferences, or due