Detection and Mitigation of Rare Subclasses in Neural Network Classifiers

Regions of high-dimensional input spaces that are underrepresented in training datasets reduce machine-learnt classifier performance, and may lead to corner cases and unwanted bias for classifiers used in decision making systems. When these regions belong to otherwise well-represented classes, their presence and negative impact are very hard to identify. We propose an approach for the detection and mitigation of such rare subclasses in neural network classifiers. The new approach is underpinned by an easy-to-compute commonality metric that supports the detection of rare subclasses, and comprises methods for reducing their impact during both model training and model exploitation.

[1]  Chih-Hong Cheng,et al.  Runtime Monitoring Neuron Activation Patterns , 2018, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[2]  Radu Calinescu,et al.  Assuring the Machine Learning Lifecycle , 2019, ACM Comput. Surv..

[3]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[4]  Vasile Palade,et al.  FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning , 2010, IEEE Transactions on Fuzzy Systems.

[5]  Song Huang,et al.  Challenges of Testing Machine Learning Applications , 2018 .

[6]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[7]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[8]  Kush R. Varshney,et al.  Optimized Pre-Processing for Discrimination Prevention , 2017, NIPS.

[9]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[10]  Rachel K. E. Bellamy,et al.  AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias , 2018, ArXiv.

[11]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[12]  KamiranFaisal,et al.  Data preprocessing techniques for classification without discrimination , 2012 .

[13]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[14]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[15]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[17]  Guy N. Rothblum,et al.  Fairness Through Computationally-Bounded Awareness , 2018, NeurIPS.

[18]  Roxana Geambasu,et al.  FairTest: Discovering Unwarranted Associations in Data-Driven Applications , 2015, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).