Plex: Towards Reliability using Pretrained Large Model Extensions

A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models’ abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks involving uncertainty (e.g., selective prediction, open set recog-nition), robust generalization (e.g., accuracy and proper scoring rules such as log-likelihood on in-and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot uncertainty). We devise 10 types of tasks over 40 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, p retrained l arge model ex tensions ( PLEX ) for vision and language modalities, respectively. Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol as it does not require designing scores or tuning the model for each individual task. We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples. We also demonstrate Plex’s capabilities on challenging tasks including zero-shot open set recognition, active

[1]  Michael W. Dusenberry,et al.  Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks , 2022, NeurIPS Datasets and Benchmarks.

[2]  Gerard de Melo,et al.  Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.

[3]  Jasper Snoek,et al.  A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness , 2022, J. Mach. Learn. Res..

[4]  Noah D. Goodman,et al.  Active Learning Helps Pretrained Models Learn the Intended Task , 2022, NeurIPS.

[5]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[6]  Renelito Delos Santos,et al.  LaMDA: Language Models for Dialog Applications , 2022, ArXiv.

[7]  J. S. K. Yi,et al.  Using Self-Supervised Pretext Tasks for Active Learning , 2022, ECCV.

[8]  H. Larochelle,et al.  Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning , 2022, ICML.

[9]  Wojciech Czaja,et al.  Active Learning at the ImageNet Scale , 2021, ArXiv.

[10]  Quoc V. Le,et al.  Combined Scaling for Zero-shot Transfer Learning , 2021, Neurocomputing.

[11]  K. Singhal,et al.  What Do We Mean by Generalization in Federated Learning? , 2021, ICLR.

[12]  Carlos Riquelme Ruiz,et al.  Sparse MoEs meet Efficient Ensembles , 2021, Trans. Mach. Learn. Res..

[13]  Nicholas Carlini,et al.  Unsolved Problems in ML Safety , 2021, ArXiv.

[14]  Zangwei Zheng,et al.  Cross-token Modeling with Conditional Computation , 2021, 2109.02008.

[15]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[16]  Lucy Vasserman,et al.  Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation , 2021, WOAH.

[17]  Abhijit Guha Roy,et al.  A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection , 2021, ArXiv.

[18]  Xiaohua Zhai,et al.  Revisiting the Calibration of Modern Neural Networks , 2021, NeurIPS.

[19]  Carlos Riquelme,et al.  Scaling Vision with Sparse Mixture of Experts , 2021, NeurIPS.

[20]  Alexander Kolesnikov,et al.  Scaling Vision Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Philip S. Yu,et al.  Are Pre-trained Transformers Robust in Intent Classification? A Missing Ingredient in Evaluation of Out-of-Scope Intent Detection , 2021, NLP4CONVAI.

[22]  Michael W. Dusenberry,et al.  Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning , 2021, ArXiv.

[23]  Stanislav Fort,et al.  Exploring the Limits of Out-of-Distribution Detection , 2021, NeurIPS.

[24]  Rodolphe Jenatton,et al.  Correlated Input-Dependent Label Noise in Large-Scale Image Classification , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Mark J. F. Gales,et al.  Uncertainty Estimation in Autoregressive Structured Prediction , 2021, ICLR.

[26]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[27]  Noam M. Shazeer,et al.  Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..

[28]  Omesh Tickoo,et al.  Improving model calibration with accuracy versus uncertainty optimization , 2020, NeurIPS.

[29]  Alexander D'Amour,et al.  Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[30]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[31]  Jasper Snoek,et al.  Combining Ensembles and Data Augmentation can Harm your Calibration , 2020, ICLR.

[32]  Aleksander Madry,et al.  BREEDS: Benchmarks for Subpopulation Shift , 2020, ICLR.

[33]  Lifu Tu,et al.  An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models , 2020, Transactions of the Association for Computational Linguistics.

[34]  Benjamin Recht,et al.  Measuring Robustness to Natural Distribution Shifts in Image Classification , 2020, NeurIPS.

[35]  Orhan Firat,et al.  GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.

[36]  D. Song,et al.  The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Xiaohua Zhai,et al.  Are we done with ImageNet? , 2020, ArXiv.

[38]  Dustin Tran,et al.  Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness , 2020, NeurIPS.

[39]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[40]  Michael W. Dusenberry,et al.  Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors , 2020, ICML.

[41]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[42]  Rodolphe Jenatton,et al.  A Simple Probabilistic Method for Deep Classification under Input-Dependent Label Noise , 2020 .

[43]  Dustin Tran,et al.  BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning , 2020, ICLR.

[44]  S. Gelly,et al.  Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.

[45]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[46]  David Berthelot,et al.  Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels , 2019, ArXiv.

[47]  Lingjia Tang,et al.  An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction , 2019, EMNLP.

[48]  Thomas L. Griffiths,et al.  Human Uncertainty Makes Classification More Robust , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Dawn Song,et al.  Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Michael A. Osborne,et al.  Radial Bayesian Neural Networks: Beyond Discrete Support In Large-Scale Bayesian Deep Learning , 2019, AISTATS.

[51]  Dawn Song,et al.  Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[52]  Jeremy Nixon,et al.  Analyzing the role of model uncertainty for electronic health records , 2019, CHIL.

[53]  B. Recht,et al.  Do Image Classifiers Generalize Across Time? , 2019, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[55]  Thomas B. Schön,et al.  Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[56]  Gustavo Carneiro,et al.  Bayesian Generative Active Deep Learning , 2019, ICML.

[57]  Jeremy Nixon,et al.  Measuring Calibration in Deep Learning , 2019, CVPR Workshops.

[58]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2019, ICLR.

[59]  P. Swietojanski,et al.  Benchmarking Natural Language Understanding Services for building Conversational Agents , 2019, IWSDS.

[60]  Lucy Vasserman,et al.  Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification , 2019, WWW.

[61]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[62]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[63]  Kimin Lee,et al.  Using Pre-Training Can Improve Model Robustness and Uncertainty , 2019, ICML.

[64]  Songcan Chen,et al.  Recent Advances in Open Set Recognition: A Survey , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Dustin Tran,et al.  Simple, Distributed, and Accelerated Probabilistic Programming , 2018, NeurIPS.

[66]  Zachary C. Lipton,et al.  Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study , 2018, EMNLP.

[67]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[68]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69]  Marcus Hutter,et al.  AGI Safety Literature Review , 2018, IJCAI.

[70]  Samuel R. Bowman,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[71]  Leslie Pack Kaelbling,et al.  Active Model Learning and Diverse Action Sampling for Task and Motion Planning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[72]  Anima Anandkumar,et al.  Active Learning with Partial Feedback , 2018, ICLR.

[73]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[74]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[75]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[76]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[77]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[78]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[79]  Ellery Wulczyn,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[80]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[81]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[82]  Francesco Bianconi,et al.  Multi-class texture analysis in colorectal cancer histology , 2016, Scientific Reports.

[83]  Stuart J. Russell,et al.  Research Priorities for Robust and Beneficial Artificial Intelligence , 2015, AI Mag..

[84]  Joelle Pineau,et al.  Conditional Computation in Neural Networks for faster models , 2015, ArXiv.

[85]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[86]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[87]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[88]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[89]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[90]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[91]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[92]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[93]  Motoaki Kawanabe,et al.  Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.

[94]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[95]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[96]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[97]  Ran El-Yaniv,et al.  On the Foundations of Noise-free Selective Classification , 2010, J. Mach. Learn. Res..

[98]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[99]  Nando de Freitas,et al.  Active Policy Learning for Robot Planning and Exploration under Uncertainty , 2007, Robotics: Science and Systems.

[100]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[101]  Dan Roth,et al.  Margin-Based Active Learning for Structured Output Spaces , 2006, ECML.

[102]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[103]  Dilek Z. Hakkani-Tür,et al.  Active learning: theory and applications to automatic speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[104]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[105]  Jorge L. Romeu,et al.  Practical Reliability Engineering , 2003, Technometrics.

[106]  Dilek Z. Hakkani-Tür,et al.  Active learning for automatic speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[107]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.

[108]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[109]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[110]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[111]  A. Dawid The Well-Calibrated Bayesian , 1982 .

[112]  Richard E. Barlow,et al.  Statistical Theory of Reliability and Life Testing: Probability Models , 1976 .

[113]  J. G. Bryan,et al.  STATISTICAL METHODS IN FORECASTING , 1962 .

[114]  Tim G. J. Rudner,et al.  Continual Learning via Sequential Function-Space Variational Inference , 2023, ICML.

[115]  Dawn Song,et al.  Scaling Out-of-Distribution Detection for Real-World Settings , 2022, ICML.

[116]  Tim G. J. Rudner,et al.  Tractable Function-Space Variational Inference in Bayesian Neural Networks , 2022, NeurIPS.

[117]  Uncertainty Estimation Using a Single Deep Deterministic Neural Network-ML Reproducibility Challenge 2020 , 2021 .

[118]  Boris Katz,et al.  ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[119]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[120]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .