Online Fairness-Aware Learning with Imbalanced Data Streams

Data-driven learning algorithms are employed in many online applications, in which data become available over time, like network monitoring, stock price prediction, job applications, etc. The underlying data distribution might evolve over time calling for model adaptation as new instances arrive and old instances become obsolete. In such dynamic environments, the so-called data streams, fairness-aware learning cannot be considered as a one-off requirement, but rather it should comprise a continual requirement over the stream. Recent fairness-aware stream classifiers ignore the problem of class imbalance, which manifests in many real-life applications, and mitigate discrimination mainly because they “reject” minority instances at large due to their inability to effectively learn all classes. In this work, we propose FABBOO, an online fairness-aware approach that maintains a valid and fair classifier over the stream. FABBOO is an online boosting approach that changes the training distribution in an online fashion by monitoring stream’s class imbalance and tweaks its decision boundary to mitigate discriminatory outcomes over the stream. Experiments on 8 real-world and 1 synthetic datasets from different domains with varying class imbalance demonstrate the superiority of our method over state-of-the-art fairnessaware stream approaches with a range (relative) increase [11.2%-14.2%] in balanced accuracy, [22.6%-31.8%] in gmean, [42.5%-49.6%] in recall, [14.3%-25.7%] in kappa and [89.4%-96.6%] in statistical parity (fairness).

[1]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[2]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[3]  Kush R. Varshney,et al.  Optimized Pre-Processing for Discrimination Prevention , 2017, NIPS.

[4]  Xin Yao,et al.  A learning framework for online class imbalance learning , 2013, 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL).

[5]  Hsuan-Tien Lin,et al.  An Online Boosting Algorithm with Theoretical Justifications , 2012, ICML.

[6]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[7]  Besnik Fetahu,et al.  FAE: A Fairness-Aware Ensemble Framework , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[8]  Steffen Staab,et al.  Bias in data‐driven artificial intelligence systems—An introductory survey , 2020, WIREs Data Mining Knowl. Discov..

[9]  Heiko Wersing,et al.  KNN Classifier with Self Adjusting Memory for Heterogeneous Concept Drift , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[10]  Eirini Ntoutsi,et al.  FAHT: An Adaptive Fairness-aware Decision Tree Classifier , 2019, IJCAI.

[11]  Toon Calders,et al.  Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures , 2013, Discrimination and Privacy in the Information Society.

[12]  John Salvatier,et al.  When Will AI Exceed Human Performance? Evidence from AI Experts , 2017, ArXiv.

[13]  J C Winck,et al.  Times they are a-changing. , 2010, Revista portuguesa de pneumologia.

[14]  Vasileios Iosifidis,et al.  FABBOO - Online Fairness-Aware Learning Under Class Imbalance , 2020, DS.

[15]  Toon Calders,et al.  Classifying without discriminating , 2009, 2009 2nd International Conference on Computer, Control and Communication.

[16]  Bodo Rosenhahn,et al.  FairNN- Conjoint Learning of Fair Representations for Fair Decisions , 2020, DS.

[17]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[18]  Julia Rubin,et al.  Fairness Definitions Explained , 2018, 2018 IEEE/ACM International Workshop on Software Fairness (FairWare).

[19]  K. Deaux,et al.  The Times They Are a-Changing … or Are They Not? A Comparison of Gender Stereotypes, 1983–2014 , 2016 .

[20]  Myra Spiliopoulou,et al.  Correcting the Usage of the Hoeffding Inequality in Stream Mining , 2013, IDA.

[21]  Eirini Ntoutsi,et al.  AdaFair: Cumulative Fairness Adaptive Boosting , 2019, CIKM.

[22]  Gregory Ditzler,et al.  Incremental Learning of Concept Drift from Streaming Imbalanced Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[23]  Myra Spiliopoulou,et al.  Ageing-Based Multinomial Naive Bayes Classifiers Over Opinionated Data Streams , 2015, ECML/PKDD.

[24]  George Forman,et al.  Tackling concept drift by temporal inductive transfer , 2006, SIGIR.

[25]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[26]  Yiannis Kompatsiaris,et al.  Adaptive Sensitive Reweighting to Mitigate Bias in Fairness-aware Classification , 2018, WWW.

[27]  Eirini Ntoutsi,et al.  Fairness-enhancing interventions in stream classification , 2019, DEXA.

[28]  Xiangliang Zhang,et al.  Exploiting reject option in classification for social discrimination control , 2018, Inf. Sci..

[29]  Emanuele Della Valle,et al.  C-SMOTE: Continuous Synthetic Minority Oversampling for Evolving Data Streams , 2020, 2020 IEEE International Conference on Big Data (Big Data).

[30]  Linda F. Wightman LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series. , 1998 .

[31]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings , 2014, Proc. Priv. Enhancing Technol..

[32]  Benjamin Fish,et al.  A Confidence-Based Approach for Balancing Fairness and Accuracy , 2016, SDM.

[33]  Eirini Ntoutsi,et al.  Dealing with Bias via Data Augmentation in Supervised Learning Scenarios , 2018 .

[34]  Anne Nagel,et al.  Measuring Racial Discrimination , 2016 .

[35]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[36]  Joelle Pineau,et al.  Online Bagging and Boosting for Imbalanced Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.