Not Just Pretty Pictures: Text-to-Image Generators Enable Interpretable Interventions for Robust Representations

Neural image classifiers are known to undergo severe performance degradation when exposed to input that exhibits covariate shift with respect to the training distribution. In this paper, we show that recent Text-to-Image (T2I) generators' ability to edit images to approximate interventions via natural-language prompts is a promising technology to train more robust classifiers. Using current open-source models, we find that a variety of prompting strategies are effective for producing augmented training datasets sufficient to achieve state-of-the-art performance (1) in widely adopted Single-Domain Generalization benchmarks, (2) in reducing classifiers' dependency on spurious features and (3) facilitating the application of Multi-Domain Generalization techniques when fewer training domains are available.

[1]  Ludwig Schmidt,et al.  LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.

[2]  Philip H. S. Torr,et al.  Is synthetic data from generative models ready for image recognition? , 2022, ICLR.

[3]  E. Hernández-Pereira,et al.  Human-in-the-loop machine learning: a state of the art , 2022, Artificial Intelligence Review.

[4]  Amit H. Bermano,et al.  An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion , 2022, ICLR.

[5]  J. Tenenbaum,et al.  Prompt-to-Prompt Image Editing with Cross Attention Control , 2022, ICLR.

[6]  Philip H. S. Torr,et al.  RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness , 2022, ArXiv.

[7]  Zeynep Akata,et al.  Attention Consistency on Visual Corruptions for Single-Source Domain Generalization , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Stella Rose Biderman,et al.  VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance , 2022, ECCV.

[9]  Zheyan Shen,et al.  NICO++: Towards Better Benchmarking for Domain Generalization , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jian Liang,et al.  Causality Inspired Representation Learning for Domain Generalization , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Mingyang Yi,et al.  Out-of-distribution Generalization with Causal Invariant Transformations , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Tim Salimans,et al.  Progressive Distillation for Fast Sampling of Diffusion Models , 2022, ICLR.

[13]  Trevor Darrell,et al.  A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Chris J. Maddison,et al.  Optimal Representations for Covariate Shift , 2021, ICLR.

[15]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yejin Choi,et al.  Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts , 2021, NAACL.

[17]  Xuezhi Wang,et al.  Measure and Improve Robustness in NLP Models: A Survey , 2021, NAACL.

[18]  D. Song,et al.  PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Chen Change Loy,et al.  Learning to Prompt for Vision-Language Models , 2021, International Journal of Computer Vision.

[20]  Milad Moradi,et al.  Evaluating the Robustness of Neural Language Models to Input Perturbations , 2021, EMNLP.

[21]  Marzyeh Ghassemi,et al.  Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing , 2021, CIKM.

[22]  Mahsa Baktash,et al.  Learning to Diversify for Single Domain Generalization , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[24]  Luke Zettlemoyer,et al.  Noisy Channel Language Model Prompting for Few-Shot Text Classification , 2021, ACL.

[25]  Hiroaki Hayashi,et al.  Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..

[26]  Y. Qiao,et al.  Domain Generalization with MixStyle , 2021, ICLR.

[27]  Cuiling Lan,et al.  Generalizing to Unseen Domains: A Survey on Domain Generalization , 2021, IEEE Transactions on Knowledge and Data Engineering.

[28]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[29]  Alec Radford,et al.  Zero-Shot Text-to-Image Generation , 2021, ICML.

[30]  B. Ommer,et al.  Taming Transformers for High-Resolution Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Dimitris N. Metaxas,et al.  Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness , 2020, NeurIPS.

[32]  Timothy M. Hospedales,et al.  Learning to Generate Novel Domains for Domain Generalization , 2020, ECCV.

[33]  Eric P. Xing,et al.  Self-Challenging Improves Cross-Domain Generalization , 2020, ECCV.

[34]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[35]  D. Song,et al.  The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Aleksander Madry,et al.  Noise or Signal: The Role of Image Backgrounds in Object Recognition , 2020, ICLR.

[37]  Prateek Jain,et al.  The Pitfalls of Simplicity Bias in Neural Networks , 2020, NeurIPS.

[38]  Dustin Tran,et al.  Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness , 2020, NeurIPS.

[39]  Luc Van Gool,et al.  Map-Guided Curriculum Domain Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Sameer Singh,et al.  Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[41]  Patrice Y. Simard,et al.  Interactive machine teaching: a human-centered approach to building machine-learned models , 2020, Hum. Comput. Interact..

[42]  Minhajul A. Badhon,et al.  Global Wheat Head Detection (GWHD) Dataset: A Large and Diverse Dataset of High-Resolution RGB-Labelled Images to Develop and Benchmark Wheat Head Detection Methods , 2020, Plant phenomics.

[43]  Xi Peng,et al.  Learning to Learn Single Domain Generalization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Bill Yuchen Lin,et al.  CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning , 2020, FINDINGS.

[45]  J. Gilmer,et al.  AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2019, ICLR.

[46]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[47]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[48]  Guanghui Wang,et al.  Towards Learning Affine-Invariant Representations via Data-Efficient CNNs , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[49]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[50]  Philip H. S. Torr,et al.  Stable Rank Normalization for Improved Generalization in Neural Networks and GANs , 2019, ICLR.

[51]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Shaoqun Zeng,et al.  From Detection of Individual Metastases to Classification of Lymph Node Status at the Patient Level: The CAMELYON17 Challenge , 2019, IEEE Transactions on Medical Imaging.

[53]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[54]  Hua Li,et al.  AMI-Net: Convolution Neural Networks With Affine Moment Invariants , 2018, IEEE Signal Processing Letters.

[55]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[57]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[58]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[60]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[61]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Yuichi Yoshida,et al.  Spectral Norm Regularization for Improving the Generalizability of Deep Learning , 2017, ArXiv.

[63]  Mengjie Zhang,et al.  Scatter Component Analysis: A Unified Framework for Domain Adaptation and Domain Generalization , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[65]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[66]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[68]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[69]  Victor Veitch,et al.  A Unified Causal View of Domain Invariant Representation Learning , 2022, ArXiv.

[70]  Frédéric Baret,et al.  Global Wheat Head Dataset 2021: an update to improve the benchmarking wheat head localization with more diversity , 2021, ArXiv.

[71]  Jakub M. Tomczak,et al.  Selecting Data Augmentation for Simulating Interventions , 2021, ICML.

[72]  Percy Liang,et al.  Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[73]  Jason Yosinski,et al.  R X R X 1: A N IMAGE SET FOR CELLULAR MORPHOLOGICAL VARIATION ACROSS MANY EXPERIMENTAL BATCHES , 2019 .

[74]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[75]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .