An overview of active learning methods for insurance with fairness appreciation

This paper addresses and solves some challenges in the adoption of machine learning in insurance with the democratization of model deployment. The first challenge is reducing the labelling effort (hence focusing on the data quality) with the help of active learning, a feedback loop between the model inference and an oracle: as in insurance the unlabeled data is usually abundant, active learning can become a significant asset in reducing the labelling cost. For that purpose, this paper sketches out various classical active learning methodologies before studying their empirical impact on both synthetic and real datasets. Another key challenge in insurance is the fairness issue in model inferences. We will introduce and integrate a post-processing fairness for multi-class tasks in this active learning framework to solve these two issues. Finally numerical experiments on unfair datasets highlight that the proposed setup presents a good compromise between model precision and fairness.

[1]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[2]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[3]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[4]  Trevor Darrell,et al.  Variational Adversarial Active Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Yg,et al.  Dropout as a Bayesian Approximation : Insights and Applications , 2015 .

[6]  Toon Calders,et al.  Building Classifiers with Independency Constraints , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[7]  Adrian E. Roitberg,et al.  Less is more: sampling chemical space with active learning , 2018, The Journal of chemical physics.

[8]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[9]  Amr Sharaf,et al.  Promoting Fairness in Learned Models by Learning to Active Learn under Parity Constraints , 2020, FAccT.

[10]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[11]  Shai Ben-David,et al.  Empirical Risk Minimization under Fairness Constraints , 2018, NeurIPS.

[12]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[13]  Bernhard Sick,et al.  Let us know your decision: Pool-based active training of a generative classifier with the selection strategy 4DS , 2013, Inf. Sci..

[14]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[15]  Luca Oneto,et al.  Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification , 2019, NeurIPS.

[16]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[17]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[18]  Jingbo Zhu,et al.  Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem , 2007, EMNLP.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Mohamed Hebiri,et al.  Fairness guarantee in multi-class classification , 2021 .

[21]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[22]  Arthur Maillart,et al.  Toward an explainable machine learning model for claim frequency: a use case in car insurance pricing with telematics data , 2021, European Actuarial Journal.

[23]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[24]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[25]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[26]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[27]  Krishna P. Gummadi,et al.  Fairness Constraints: A Flexible Approach for Fair Classification , 2019, J. Mach. Learn. Res..

[28]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[29]  Bin Li,et al.  A survey on instance selection for active learning , 2012, Knowledge and Information Systems.

[30]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  Zhihui Li,et al.  A Survey of Deep Active Learning , 2020, ACM Comput. Surv..

[34]  Alexandre Lacoste,et al.  Can Active Learning Preemptively Mitigate Fairness Issues? , 2021, ArXiv.

[35]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[36]  Linda F. Wightman LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series. , 1998 .

[37]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[38]  Bernd Bischl,et al.  Interpretable Machine Learning - A Brief History, State-of-the-Art and Challenges , 2020, PKDD/ECML Workshops.

[39]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[40]  Mohan M. Trivedi,et al.  Active learning for on-road vehicle detection: a comparative study , 2014, Machine Vision and Applications.

[41]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[42]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[43]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[44]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[45]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..