How should statistical profiling models be implemented when anti-discrimination policies prohibit basing predictions on characteristics such as race, gender, or age? Companies, schools, and social-program administrators typically address such concerns by simply excluding these sensitive characteristics from the models they estimate. However, other variables that may be correlated with these omitted characteristics – such as zip codes, credit scores, and job tenure – are routinely used and may serve as partial proxies for the excluded groups. We examine the importance of this issue for the federally mandated Worker Profiling and Reemployment Services system, in which states profile unemployment-insurance (UI) claimants and require workshop attendance from those who are predicted to be likely to exhaust their benefits. Using a large data set on UI claimants, we utilize a simple procedure to compare and contrast the approach commonly used by states with one that eliminates the ability for modeling variables to proxy for sensitive characteristics. In this way we can establish the degree to which the program outcomes are affected by modeling variables serving as proxies for the excluded characteristics. We find a significant effect, especially across racial groups, which we demonstrate is largely driven by the correlation between race and zip codes. Our benchmark results suggest that eliminating the influence of the sensitive characteristics on the predictive process would decrease the fraction of required workshop attendees that are black by roughly 25%. We address the question of predictive accuracy and discuss the relevance of these findings for other situations such as mortgage lending, insurance pricing, and college admissions. Contact Pope at dpope@econ.berkeley.edu and Sydnor at justin.sydnor@case.edu. *We thank Dan Black, David Card, Stefano DellaVigna, Jesse Leary, Jaren Pope, Matthew Rabin, Stephen Ross, Jesse Rothstein, Paige Skiba, Jeff Smith, and seminar participants at UC Berkeley for helpful comments and suggestions. We are also grateful to Dan Black and David Card for generously sharing data used for this project. All errors are our own. In an increasing number of important economic settings decision-makers use statistical predictions when determining how to treat people. Examples include mortgage and credit-card lending, insurance pricing, college admissions, and social-benefit provision. Which individual characteristics should modelers use to make these predictions? The simple answer is any characteristic that improves predictive accuracy. In many settings, though, the use of a number of “socially unacceptable predictors” (SUPs) – such as race, gender, and age – in the predictive process is considered inappropriate statistical discrimination. Profilers typically respond by taking a “SUP-blind” approach, simply dropping the sensitive characteristics from their models. 1 However, at the same time they routinely use other variables – such as zip codes, credit scores, and job tenure – that may be highly correlated with these omitted characteristics. Because of this correlation, these variables may serve as partial proxies for the excluded categories. As a result, excluded variables such as race, gender, and age may continue to influence predictions by distorting the weight given to other variables relative to those variables’ direct effect on the outcome of interest. We label the effect that this distortion has on predicted outcomes “implicit statistical discrimination.” From a conceptual point of view, implicit statistical discrimination is simply a reflection of the standard omitted variable bias problem. What makes it more interesting, however, is that the potential bias here does not result from unobserved variables, but is rather the result of a determined effort to ignore the influence of certain known characteristics. While a researcher interested in the causal effect of job tenure on unemployment would not exclude age from her model, this is a routine solution for real-world practitioners concerned about discrimination. In these practical applications, does the residual influence of SUPs introduced by the standard approach affect predictions in a meaningful way, and if so is there a realistic alternative? 1 In the literature on discrimination, treating individuals differently on the explicit basis of race, gender, or other protected characteristics is known as “disparate treatment.” What we call SUP-blindness means avoiding disparate treatment.
[1]
Erik Eyster,et al.
Does Banning Affirmative Action Lower College Student Quality
,
2003
.
[2]
Tolga Yuret,et al.
Color-Blind Affirmative Action
,
2003
.
[3]
Geoffrey M. B. Tootell.
Redlining in Boston: Do Mortgage Lenders Discriminate Against Neighborhoods?
,
1996
.
[4]
Stephen L. Ross,et al.
The Color of Credit: Mortgage Discrimination, Research Methodology, and Fair-Lending Enforcement
,
2002
.
[5]
Stephen L. Ross.
The Continuing Practice and Impact of Discrimination
,
2005
.
[6]
G. Borjas,et al.
Biased Screening and Discrimination in the Labor Market
,
1978
.
[7]
S. Lundberg,et al.
The Enforcement of Equal Opportunity Laws Under Imperfect Information: Affirmative Action and Alternatives
,
1991
.
[8]
S. A. Wandner,et al.
Worker Profiling and Reemployment Services Policy Workgroup: Final Report and Recommendations
,
1999
.
[9]
K. Arrow.
The Theory of Discrimination
,
1971
.
[10]
Steven N. Durlauf,et al.
Assessing Racial Profiling
,
2005
.
[11]
Jesse Rothstein,et al.
SAT Scores, High Schools, and Collegiate Performance Predictions
,
2009
.
[12]
David Card,et al.
Extended Benefits and the Duration of Ui Spells: Evidence from the New Jersey Extended Benefit Program
,
1998
.
[13]
Mark C. Berger,et al.
Is the Threat of Reemployment Services More Effective than the Services Themselves? Evidence from Random Assignment in the UI System *
,
2003
.
[14]
Stephen A. Wandner.
Early reemployment for dislocated workers in the United States
,
1997
.
[15]
E. Phelps.
The Statistical Theory of Racism and Sexism
,
1972
.
[16]
Stewart J. Schwab.
Is Statistical Discrimination Efficient
,
1986
.