论文信息 - Test-Time Training for Out-of-Distribution Generalization

Test-Time Training for Out-of-Distribution Generalization

We introduce a general approach, called test-time training, for improving the performance of predictive models when test and training data come from different distributions. Test-time training turns a single unlabeled test instance into a self-supervised learning problem, on which we update the model parameters before making a prediction on this instance. We show that this simple idea leads to surprising improvements on diverse image classification benchmarks aimed at evaluating robustness to distribution shifts. Theoretical investigations on a convex model reveal helpful intuitions for when we can expect our approach to help.

[1] Alexander Gammerman,et al. Learning by Transduction , 1998, UAI.

[2] John Blitzer,et al. Co-Training for Domain Adaptation , 2011, NIPS.

[3] Mehryar Mohri,et al. Algorithms and Theory for Multiple-Source Adaptation , 2018, NeurIPS.

[4] Trevor Darrell,et al. Continuous Manifold Based Adaptation for Evolving Visual Domains , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[6] Mengjie Zhang,et al. Domain Generalization for Object Recognition with Multi-task Autoencoders , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7] Gregory Shakhnarovich,et al. Colorization as a Proxy Task for Visual Understanding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.

[9] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[10] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[11] Yongxin Yang,et al. Episodic Training for Domain Generalization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12] Trevor Darrell,et al. Discovering Latent Domains for Multisource Domain Adaptation , 2012, ECCV.

[13] Yi Sun,et al. Transfer of Adversarial Robustness Between Perturbation Types , 2019, ArXiv.

[14] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[17] Erik G. Learned-Miller,et al. Online domain adaptation of a pre-trained cascade of classifiers , 2011, CVPR 2011.

[18] Gabriela Csurka,et al. Domain Adaptation for Visual Applications: A Comprehensive Survey , 2017, ArXiv.

[19] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Xiaohui Xie,et al. Neural Multi-Scale Self-Supervised Registration for Echocardiogram Dense Tracking , 2019, bioRxiv.

[21] Swami Sankaranarayanan,et al. MetaReg: Towards Domain Generalization using Meta-Regularization , 2018, NeurIPS.

[22] Yuki M. Asano,et al. A critical analysis of self-supervision, or what we can learn from a single image , 2019, ICLR.

[23] Kaiming He,et al. Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24] Alexei A. Efros,et al. Unsupervised Domain Adaptation through Self-Supervision , 2019, ArXiv.

[25] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[26] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[27] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[28] Luyu Wang,et al. advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch , 2019, ArXiv.

[29] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[30] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.

[31] Matthias Bethge,et al. Generalisation in humans and deep neural networks , 2018, NeurIPS.

[32] Armand Joulin,et al. Unsupervised Learning by Predicting Noise , 2017, ICML.

[33] Greg Yang,et al. Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[34] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[35] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[36] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[37] Benjamin Recht,et al. A systematic framework for natural perturbations from videos , 2019, ArXiv.

[38] Nitish Srivastava. Unsupervised Learning of Visual Representations using Videos , 2015 .

[39] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[40] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[41] Michael I. Jordan,et al. Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[42] Nilesh Tripuraneni,et al. Single Point Transductive Prediction , 2019, ICML.

[43] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[44] Kevin Gimpel,et al. Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[45] Bernhard Schölkopf,et al. Domain Generalization via Invariant Feature Representation , 2013, ICML.

[46] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[47] Andrea Vedaldi,et al. Surprising Effectiveness of Few-Image Unsupervised Feature Learning , 2019, ArXiv.

[48] Fabio Maria Carlucci,et al. Domain Generalization by Solving Jigsaw Puzzles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Yuan Shi,et al. Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51] Bolei Zhou,et al. Semantic photo manipulation with a generative image prior , 2019, ACM Trans. Graph..

[52] John C. Duchi,et al. Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[53] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.

[54] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[55] Siddhartha Chaudhuri,et al. Generalizing Across Domains via Cross-Gradient Training , 2018, ICLR.

[56] Michal Irani,et al. Super-resolution from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[57] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58] Taesung Park,et al. CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[59] Donald A. Adjeroh,et al. Unified Deep Supervised Domain Adaptation and Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[60] Aditi Raghunathan,et al. Certified Defenses against Adversarial Examples , 2018, ICLR.

[61] Kaiming He,et al. Group Normalization , 2018, ECCV.

[62] Nikos Komodakis,et al. Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63] D. Tao,et al. Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[64] Nikos Komodakis,et al. Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[65] Claire Cardie,et al. Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[66] Yongxin Yang,et al. Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[67] Nilesh Tripuraneni,et al. Debiasing Linear Prediction , 2019, ArXiv.

[68] Michael I. Jordan,et al. Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.

[69] Online Model Distillation for Efficient Video Inference , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[70] King-Sun Fu,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[72] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[73] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[74] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[75] Subhransu Maji,et al. Boosting Supervision with Self-Supervision for Few-shot Learning , 2019, ArXiv.

[76] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[77] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.

[78] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[79] Bartunov Sergey,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[80] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[81] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[82] Dawn Song,et al. Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[83] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[84] Tianbao Yang,et al. Learning Attributes Equals Multi-Source Domain Generalization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85] Moustapha Cissé,et al. Countering Adversarial Images using Input Transformations , 2018, ICLR.

[86] Michal Irani,et al. "Zero-Shot" Super-Resolution Using Deep Internal Learning , 2017, CVPR.

[87] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[88] Yang Song,et al. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[89] Matthias Hein,et al. Provable Robustness of ReLU networks via Maximization of Linear Regions , 2018, AISTATS.

[90] Harini Kannan,et al. Adversarial Logit Pairing , 2018, NIPS 2018.

[91] Benjamin Recht,et al. Do CIFAR-10 Classifiers Generalize to CIFAR-10? , 2018, ArXiv.

[92] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[93] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).