Tracking Machine Learning Bias Creep in Traditional and Online Lending Systems with Covariance Analysis

Machine Learning (ML) algorithms are embedded within online banking services, proposing decisions about consumers’ credit cards, car loans, and mortgages. These algorithms are sometimes biased, resulting in unfair decisions toward certain groups. One common approach for addressing such bias is simply dropping the sensitive attributes from the training data (e.g. gender). However, sensitive attributes can indirectly be represented by other attributes in the data (e.g. maternity leave taken). This paper addresses the problem of identifying attributes that can mimic sensitive attributes by proposing a new approach based on covariance analysis. Our evaluation conducted on two different credit datasets, extracted from a traditional and an online banking institution respectively, shows how our approach: (i) effectively identifies the attributes from the data that encapsulate sensitive information and, (ii) leads to the reduction of biases in ML models, while maintaining their overall performance.

[1]  Sanjiv Ranjan Das,et al.  Fairness Measures for Machine Learning in Finance , 2021, The Journal of Financial Data Science.

[2]  Krishnaram Kenthapadi,et al.  Amazon SageMaker Clarify: Machine Learning Bias Detection and Explainability in the Cloud , 2021, KDD.

[3]  T. Menzies,et al.  Bias in machine learning software: why? how? what to do? , 2021, ESEC/SIGSOFT FSE.

[4]  Sahil Verma,et al.  Removing biased data to improve fairness and accuracy , 2021, ArXiv.

[5]  Fred Morstatter,et al.  Exacerbating Algorithmic Bias through Fairness Attacks , 2020, AAAI.

[6]  Sriram Vasudevan,et al.  LiFT: A Scalable Framework for Measuring Fairness in ML Applications , 2020, CIKM.

[7]  Majid Bazarbash,et al.  The Promise of Fintech , 2020, Departmental Papers.

[8]  Majid Bazarbash,et al.  The Promise of Fintech , 2020, Departmental Papers.

[9]  Bertrand K. Hassani Societal bias reinforcement through machine learning: a credit scoring perspective , 2020, AI and Ethics.

[10]  Hridesh Rajan,et al.  Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness , 2020, ESEC/SIGSOFT FSE.

[11]  Frederik J. Zuiderveen Borgesius,et al.  Strengthening legal protection against discrimination by algorithms and artificial intelligence , 2020, The International Journal of Human Rights.

[12]  Yunfeng Zhang,et al.  Data Augmentation for Discrimination Prevention and Bias Disambiguation , 2020, AIES.

[13]  Steffen Staab,et al.  Bias in data‐driven artificial intelligence systems—An introductory survey , 2020, WIREs Data Mining Knowl. Discov..

[14]  Pratyush Garg,et al.  Fairness Metrics: A Comparative Analysis , 2020, 2020 IEEE International Conference on Big Data (Big Data).

[15]  Yukun Zhang,et al.  Fairness Assessment for Artificial Intelligence in Financial Industry , 2019, ArXiv.

[16]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[17]  Carlos Castillo,et al.  Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries , 2019, Front. Big Data.

[18]  ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2019 .

[19]  Kush R. Varshney,et al.  Bias Mitigation Post-processing for Individual and Group Fairness , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  James Y. Zou,et al.  Multiaccuracy: Black-Box Post-Processing for Fairness in Classification , 2018, AIES.

[21]  Julia Rubin,et al.  Fairness Definitions Explained , 2018, 2018 IEEE/ACM International Workshop on Software Fairness (FairWare).

[22]  Ricardo Baeza-Yates,et al.  Bias on the web , 2018, Commun. ACM.

[23]  Steven Mills,et al.  Fair Forests: Regularized Tree Induction to Minimize Model Bias , 2017, AIES.

[24]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[25]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[26]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[27]  Purushottam W. Laud,et al.  New spatially continuous indices of redlining and racial bias in mortgage lending: links to survival after breast cancer diagnosis and implications for health disparities research. , 2016, Health & place.

[28]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[29]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[30]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[31]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[32]  Toon Calders,et al.  Discrimination Aware Decision Tree Learning , 2010, 2010 IEEE International Conference on Data Mining.

[33]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[34]  Steve Austin,et al.  The forward-backward search algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[35]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[36]  R. Plackett,et al.  Karl Pearson and the Chi-squared Test , 1983 .

[37]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[38]  Sebastian Schelter,et al.  Taming Technical Bias in Machine Learning Pipelines , 2020, IEEE Data Eng. Bull..

[39]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[40]  R. Levine,et al.  Theory and Evidence , 2009 .

[41]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[42]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .