论文信息 - Test-Time Training for Out-of-Distribution Generalization

Test-Time Training for Out-of-Distribution Generalization

We introduce a general approach, called test-time training, for improving the performance of predictive models when test and training data come from different distributions. Test-time training turns a single unlabeled test instance into a self-supervised learning problem, on which we update the model parameters before making a prediction on this instance. We show that this simple idea leads to surprising improvements on diverse image classification benchmarks aimed at evaluating robustness to distribution shifts. Theoretical investigations on a convex model reveal helpful intuitions for when we can expect our approach to help.

[1] D. Tao,et al. Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[2] Michal Irani,et al. Super-resolution from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Yuan Shi,et al. Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Xiaohui Xie,et al. Neural Multi-Scale Self-Supervised Registration for Echocardiogram Dense Tracking , 2019, bioRxiv.

[6] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7] King-Sun Fu,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Bernhard Schölkopf,et al. Domain Generalization via Invariant Feature Representation , 2013, ICML.

[9] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[12] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13] Kaiming He,et al. Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14] Rich Caruana,et al. Multitask Learning , 1997, Machine-mediated learning.

[15] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[17] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[18] Subhransu Maji,et al. Boosting Supervision with Self-Supervision for Few-shot Learning , 2019, ArXiv.

[19] Yi Sun,et al. Transfer of Adversarial Robustness Between Perturbation Types , 2019, ArXiv.

[20] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[21] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[22] Moustapha Cissé,et al. Countering Adversarial Images using Input Transformations , 2018, ICLR.

[23] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.

[24] Swami Sankaranarayanan,et al. MetaReg: Towards Domain Generalization using Meta-Regularization , 2018, NeurIPS.

[25] Matthias Bethge,et al. Generalisation in humans and deep neural networks , 2018, NeurIPS.

[26] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[27] Kaiming He,et al. Group Normalization , 2018, ECCV.

[28] Siddhartha Chaudhuri,et al. Generalizing Across Domains via Cross-Gradient Training , 2018, ICLR.

[29] Taesung Park,et al. CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[30] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.

[31] Daan Wierstra,et al. One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[32] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[33] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[34] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[35] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[36] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[37] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[38] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[39] Michal Irani,et al. "Zero-Shot" Super-Resolution Using Deep Internal Learning , 2017, CVPR.

[40] Gabriela Csurka,et al. Domain Adaptation for Visual Applications: A Comprehensive Survey , 2017, ArXiv.

[41] Armand Joulin,et al. Unsupervised Learning by Predicting Noise , 2017, ICML.

[42] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[43] Tianbao Yang,et al. Learning Attributes Equals Multi-Source Domain Generalization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Claire Cardie,et al. Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[45] Alexander Gammerman,et al. Learning by Transduction , 1998, UAI.

[46] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[47] John C. Duchi,et al. Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[48] Abhinav Gupta,et al. Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[50] Luyu Wang,et al. advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch , 2019, ArXiv.

[51] Trevor Darrell,et al. Continuous Manifold Based Adaptation for Evolving Visual Domains , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52] Mehryar Mohri,et al. Algorithms and Theory for Multiple-Source Adaptation , 2018, NeurIPS.

[53] Yongxin Yang,et al. Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54] Harini Kannan,et al. Adversarial Logit Pairing , 2018, NIPS 2018.

[55] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.

[56] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[57] Yongxin Yang,et al. Episodic Training for Domain Generalization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58] Greg Yang,et al. Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[59] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.

[60] Dawn Song,et al. Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[61] Alexei A. Efros,et al. Unsupervised Domain Adaptation through Self-Supervision , 2019, ArXiv.

[62] Kevin Gimpel,et al. Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[63] Michael I. Jordan,et al. Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.

[64] Benjamin Recht,et al. Do CIFAR-10 Classifiers Generalize to CIFAR-10? , 2018, ArXiv.

[65] Allan Jabri,et al. Learning Correspondence From the Cycle-Consistency of Time , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66] Nikos Komodakis,et al. Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[67] Matthias Hein,et al. Provable Robustness of ReLU networks via Maximization of Linear Regions , 2018, AISTATS.

[68] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[69] Fabio Maria Carlucci,et al. Domain Generalization by Solving Jigsaw Puzzles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70] Nikos Komodakis,et al. Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71] Benjamin Recht,et al. A systematic framework for natural perturbations from videos , 2019, ArXiv.

[72] Yang Song,et al. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[73] Trevor Darrell,et al. Discovering Latent Domains for Multisource Domain Adaptation , 2012, ECCV.

[74] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75] Erik G. Learned-Miller,et al. Online domain adaptation of a pre-trained cascade of classifiers , 2011, CVPR 2011.

[76] Nilesh Tripuraneni,et al. Debiasing Linear Prediction , 2019, ArXiv.

[77] Donald A. Adjeroh,et al. Unified Deep Supervised Domain Adaptation and Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[78] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[79] Bolei Zhou,et al. Semantic photo manipulation with a generative image prior , 2019, ACM Trans. Graph..

[80] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[82] Aditi Raghunathan,et al. Certified Defenses against Adversarial Examples , 2018, ICLR.

[83] John Blitzer,et al. Co-Training for Domain Adaptation , 2011, NIPS.

[84] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[85] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[86] Gregory Shakhnarovich,et al. Colorization as a Proxy Task for Visual Understanding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[88] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[89] Deva Ramanan,et al. Online Model Distillation for Efficient Video Inference , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[90] Mengjie Zhang,et al. Domain Generalization for Object Recognition with Multi-task Autoencoders , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[91] Michael I. Jordan,et al. Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[92] Andrea Vedaldi,et al. Surprising Effectiveness of Few-Image Unsupervised Feature Learning , 2019, ArXiv.