When and How Mixup Improves Calibration

In many machine learning applications, it is important for the model to provide confidence scores that accurately captures its prediction uncertainty. Although modern learning methods have achieved great success in predictive accuracy, generating calibrated confidence scores remains a major challenge. Mixup, a popular yet simple data augmentation technique based on taking convex combinations of pairs of training examples, has been empirically found to significantly improve confidence calibration across diverse applications. However, when and how Mixup helps calibration is still mysterious. In this paper, we theoretically prove that Mixup improves calibration in high-dimensional settings by investigating two natural data models on classification and regression. Interestingly, the calibration benefit of Mixup increases as the model capacity increases. We support our theories with experiments on common architectures and data sets. In addition, we study how Mixup improves calibration in semi-supervised learning. While incorporating unlabeled data can sometimes make the model less calibrated, adding Mixup training mitigates this issue and provably improves calibration. Our analysis provides new insights and a framework to understand Mixup and calibration.

[1]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[2]  Andrea Montanari,et al.  Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.

[3]  Tengyu Ma,et al.  Verified Uncertainty Calibration , 2019, NeurIPS.

[4]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[5]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[6]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Hyun Oh Song,et al.  Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup , 2020, ICML.

[9]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[10]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[11]  Stefano Ermon,et al.  Individual Calibration with Randomized Forecasting , 2020, ICML.

[12]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[13]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[14]  Björn E. Ottersten,et al.  Improving Credit Card Fraud Detection with Calibrated Probabilities , 2014, SDM.

[15]  Ryne Roady,et al.  Improved Robustness to Open Set Inputs via Tempered Mixup , 2020, ECCV Workshops.

[16]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[17]  Mihaela van der Schaar,et al.  Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift , 2020, ICML.

[18]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[19]  Lucila Ohno-Machado,et al.  A tutorial on calibration measurements and calibration models for clinical prediction models , 2020, J. Am. Medical Informatics Assoc..

[20]  Zhun Deng,et al.  Improving Adversarial Robustness via Unlabeled Out-of-Domain Data , 2020, ArXiv.

[21]  Zhun Deng,et al.  How Does Mixup Help With Robustness and Generalization? , 2020, ArXiv.

[22]  Florian Buettner,et al.  Towards Trustworthy Predictions from Deep Neural Networks with Fast Adversarial Calibration , 2019, AAAI.

[23]  Dean P. Foster,et al.  Calibrated Learning and Correlated Equilibrium , 1997 .

[24]  Jasper Snoek,et al.  Combining Ensembles and Data Augmentation can Harm your Calibration , 2020, ICLR.

[25]  Adrian E. Raftery,et al.  Weather Forecasting with Ensemble Methods , 2005, Science.

[26]  Stefano Ermon,et al.  Accurate Uncertainties for Deep Learning Using Calibrated Regression , 2018, ICML.

[27]  Jeremy Nixon,et al.  Measuring Calibration in Deep Learning , 2019, CVPR Workshops.

[28]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[29]  Gopinath Chennupati,et al.  On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks , 2019, NeurIPS.

[30]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[33]  Dean P. Foster,et al.  Variable Selection in Data Mining: Building a Predictive Model for Bankruptcy , 2001 .

[34]  Jihoon Kim,et al.  Calibrating predictive model estimates to support personalized medicine , 2011, J. Am. Medical Informatics Assoc..

[35]  Alex Lamb,et al.  Deep Learning for Classical Japanese Literature , 2018, ArXiv.

[36]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[37]  Pradeep Ravikumar,et al.  Sharp Statistical Guarantees for Adversarially Robust Gaussian Classification , 2020, ArXiv.

[38]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[39]  Hongyu Guo,et al.  MixUp as Locally Linear Out-Of-Manifold Regularization , 2018, AAAI.

[40]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[41]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.