Moment Multicalibration for Uncertainty Estimation

We show how to achieve the notion of "multicalibration" from Hebert-Johnson et al. [2018] not just for means, but also for variances and other higher moments. Informally, it means that we can find regression functions which, given a data point, can make point predictions not just for the expectation of its label, but for higher moments of its label distribution as well-and those predictions match the true distribution quantities when averaged not just over the population as a whole, but also when averaged over an enormous number of finely defined subgroups. It yields a principled way to estimate the uncertainty of predictions on many different subgroups-and to diagnose potential sources of unfairness in the predictive power of features across subgroups. As an application, we show that our moment estimates can be used to derive marginal prediction intervals that are simultaneously valid as averaged over all of the (sufficiently large) subgroups for which moment multicalibration has been obtained.

[1]  T. Philips,et al.  The Moment Bound is Tighter than Chernoff's Bound for Positive Tail Probabilities , 1995 .

[2]  E. Lehrer Any Inspection is Manipulable , 2001 .

[3]  Aaron Roth,et al.  Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.

[4]  Raef Bassily,et al.  Algorithmic stability for adaptive data analysis , 2015, STOC.

[5]  Guy N. Rothblum,et al.  Probably Approximately Metric-Fair Learning , 2018, ICML.

[6]  Christopher Jung,et al.  A new analysis of differential privacy’s generalization guarantees (invited paper) , 2019, ITCS.

[7]  Alvaro Sandroni,et al.  Calibration with Many Checking Rules , 2003, Math. Oper. Res..

[8]  Alexandra Chouldechova,et al.  A snapshot of the frontiers of fairness in machine learning , 2020, Commun. ACM.

[9]  Stefano Ermon,et al.  Individual Calibration with Randomized Forecasting , 2020, ICML.

[10]  Aravind Srinivasan,et al.  Chernoff-Hoeffding bounds for applications with limited independence , 1995, SODA '93.

[11]  R. Barber Is distribution-free inference possible for binary regression? , 2020, 2004.09477.

[12]  Guy N. Rothblum,et al.  Fairness Through Computationally-Bounded Awareness , 2018, NeurIPS.

[13]  Guy N. Rothblum,et al.  Multicalibration: Calibration for the (Computationally-Identifiable) Masses , 2018, ICML.

[14]  James Y. Zou,et al.  Multiaccuracy: Black-Box Post-Processing for Fairness in Classification , 2018, AIES.

[15]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[16]  Seth Neel,et al.  Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness , 2017, ICML.

[17]  Vladimir Vovk,et al.  A tutorial on conformal prediction , 2007, J. Mach. Learn. Res..

[18]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[19]  David Oakes,et al.  Self-Calibrating Priors Do Not Exist , 1985 .

[20]  Guy N. Rothblum,et al.  Learning from Outcomes: Evidence-Based Rankings , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[21]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[22]  Aaron Roth,et al.  Average Individual Fairness: Algorithms, Generalization and Experiments , 2019, NeurIPS.

[23]  Seth Neel,et al.  Meritocratic Fairness for Infinite and Contextual Bandits , 2018, AIES.

[24]  Sample Complexity of Uniform Convergence for Multicalibration , 2020, NeurIPS.

[25]  E. Candès,et al.  The limits of distribution-free conditional predictive inference , 2019, Information and Inference: A Journal of the IMA.

[26]  Seth Neel,et al.  An Empirical Study of Rich Subgroup Fairness for Machine Learning , 2018, FAT.

[27]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[28]  Toniann Pitassi,et al.  Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.

[29]  Panos M. Pardalos,et al.  Greedy approximations for minimum submodular cover with submodular cost , 2010, Comput. Optim. Appl..

[30]  A. Dawid The Well-Calibrated Bayesian , 1982 .

[31]  Larry Wasserman,et al.  Distribution‐free prediction bands for non‐parametric regression , 2014 .