Loss minimization yields multicalibration for large neural networks

Multicalibration is a notion of fairness that aims to provide accurate predictions across a large set of groups. Multicalibration is known to be a different goal than loss minimization, even for simple predictors such as linear functions. In this note, we show that for (almost all) large neural network sizes, optimally minimizing squared error leads to multicalibration. Our results are about representational aspects of neural networks, and not about algorithmic or sample complexity considerations. Previous such results were known only for predictors that were nearly Bayes-optimal and were therefore representation independent. We emphasize that our results do not apply to specific algorithms for optimizing neural networks, such as SGD, and they should not be interpreted as"fairness comes for free from optimizing neural networks".

[1]  Oskar van der Wal,et al.  Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling , 2023, ICML.

[2]  Michael P. Kim,et al.  Characterizing notions of omniprediction via multicalibration , 2023, ArXiv.

[3]  Michael P. Kim,et al.  Loss Minimization through the Lens of Outcome Indistinguishability , 2022, ITCS.

[4]  O. Reingold,et al.  Omnipredictors for Constrained Optimization , 2022, ICML.

[5]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[6]  Michael P. Kim,et al.  Low-Degree Multicalibration , 2022, COLT.

[7]  Michael P. Kim,et al.  Universal adaptability: Target-independent inference that competes with propensity scoring , 2022, Proceedings of the National Academy of Sciences.

[8]  Adam Tauman Kalai,et al.  Omnipredictors , 2021, ITCS.

[9]  Dustin Tran,et al.  Soft Calibration Objectives for Neural Networks , 2021, NeurIPS.

[10]  Xiaohua Zhai,et al.  Revisiting the Calibration of Modern Neural Networks , 2021, NeurIPS.

[11]  Omer Reingold,et al.  Multicalibrated Partitions for Importance Weights , 2021, ALT.

[12]  Samy Bengio,et al.  Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.

[13]  Guy N. Rothblum,et al.  Outcome indistinguishability , 2020, Electron. Colloquium Comput. Complex..

[14]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[15]  Shrey Desai,et al.  Calibration of Pre-trained Transformers , 2020, EMNLP.

[16]  J. Zico Kolter,et al.  Uniform convergence may be unable to explain generalization in deep learning , 2019, NeurIPS.

[17]  Max Simchowitz,et al.  The Implicit Fairness Criterion of Unconstrained Learning , 2018, ICML.

[18]  Guy N. Rothblum,et al.  Multicalibration: Calibration for the (Computationally-Identifiable) Masses , 2018, ICML.

[19]  James Y. Zou,et al.  Multiaccuracy: Black-Box Post-Processing for Fairness in Classification , 2018, AIES.

[20]  Seth Neel,et al.  Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness , 2017, ICML.

[21]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[22]  Adam Tauman Kalai,et al.  Agnostically learning decision trees , 2008, STOC.

[23]  Ryan O'Donnell,et al.  Learning functions of k relevant variables , 2004, J. Comput. Syst. Sci..

[24]  Guy Kindler,et al.  Testing juntas , 2002, J. Comput. Syst. Sci..

[25]  Yishay Mansour,et al.  Boosting Using Branching Programs , 2000, J. Comput. Syst. Sci..

[26]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[27]  C. Dwork,et al.  HappyMap : A Generalized Multicalibration Method , 2023, Information Technology Convergence and Services.

[28]  Michael P. Kim,et al.  Beyond Bernoulli: Generating Random Outcomes that cannot be Distinguished from Nature , 2022, ALT.

[29]  Behnam Neyshabur,et al.  The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers , 2021, ICLR.

[30]  A. Narayanan,et al.  Fairness and Machine Learning Limitations and Opportunities , 2018 .

[31]  T. Sanders,et al.  Analysis of Boolean Functions , 2012, ArXiv.

[32]  R. Titsworth Correlation properties of cyclic sequences , 1962 .