Towards reliable and fair probabilistic predictions: field-aware calibration with neural networks

In machine learning, it is observed that probabilistic predictions sometimes disagree with averaged actual outcomes on certain subsets of data. This is also known as miscalibration that is responsible for unreliability and unfairness of practical machine learning systems. In this paper, we put forward an evaluation metric for calibration, coined field-level calibration error, that measures bias in predictions over the input fields that the decision maker concerns. We show that existing calibration methods perform poorly under our new metric. Specifically, after learning a calibration mapping over the validation dataset, existing methods have limited improvements in our error metric and completely fail to improve other non-calibration metrics such as the AUC score. We propose Neural Calibration, a new calibration method, which learns to calibrate by making full use of all input information over the validation set. We test our method on five large-scale real-world datasets. The results show that Neural Calibration significantly improves against uncalibrated predictions in all well-known metrics such as the negative log-likelihood, the Brier score, the AUC score, as well as our proposed field-level calibration error.

[1]  M. de Rijke,et al.  Calibration: A Simple Way to Improve Click Models , 2018, CIKM.

[2]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[3]  Aditya Krishna Menon,et al.  The cost of fairness in binary classification , 2018, FAT.

[4]  José Hernández-Orallo,et al.  Calibration of Machine Learning Models , 2012 .

[5]  Kevin B. Korb,et al.  Calibration and the Evaluation of Predictive Learners , 1999, International Joint Conference on Artificial Intelligence.

[6]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[7]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[8]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[11]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[12]  Brian Neelon,et al.  Bayesian Isotonic Regression and Trend Analysis , 2004, Biometrics.

[13]  Jun Sakuma,et al.  Fairness-aware Learning through Regularization Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[14]  Wojciech Kotlowski,et al.  Online Isotonic Regression , 2016, COLT.

[15]  Ran El-Yaniv,et al.  Bias-Reduced Uncertainty Estimation for Deep Neural Classifiers , 2018, ICLR.

[16]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[17]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[18]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[19]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.