R-Tuning: Regularized Prompt Tuning in Open-Set Scenarios

In realistic open-set scenarios where labels of a part of testing data are totally unknown, current prompt methods on vision-language (VL) models always predict the unknown classes as the downstream training classes. The exhibited label bias causes difficulty in the open set recognition (OSR), by which an image should be correctly predicted as one of the known classes or the unknown one. To learn prompts in open-set scenarios, we propose the Regularized prompt Tuning (R-Tuning) to mitigate the label bias. It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more. Thus, prompts are tuned in a simulated open-set scenario. Besides, inspired by the observation that classifying directly on large datasets causes a much higher false positive rate than on small datasets, we propose the Combinatorial Tuning and Testing (CTT) strategy for improving performance. CTT decomposes R-Tuning on large datasets as multiple independent group-wise tuning on fewer classes, then makes comprehensive predictions by selecting the optimal sub-prompt. For fair comparisons, we construct new baselines for OSR based on VL models, especially for prompt methods. Our method achieves the best results on datasets with various scales. Extensive ablation studies validate the effectiveness of our method.

[1]  Pau Rodríguez López,et al.  MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting , 2022, EACL.

[2]  Hiroaki Hayashi,et al.  Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..

[3]  Chen Change Loy,et al.  Unified Vision and Language Prompt Learning , 2022, ArXiv.

[4]  F. Khan,et al.  MaPLe: Multi-modal Prompt Learning , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Adrian Bulat,et al.  Language-Aware Soft Prompting for Vision & Language Foundation Models , 2022, ArXiv.

[6]  Hao Zhang,et al.  Prompt Tuning with Soft Context Sharing for Vision-Language Models , 2022, ArXiv.

[7]  Junho Park,et al.  Difficulty-Aware Simulator for Open Set Recognition , 2022, ECCV.

[8]  Zhanzhan Cheng,et al.  PMAL: Open Set Recognition via Robust Prototype Mining , 2022, AAAI.

[9]  Chen Change Loy,et al.  Conditional Prompt Learning for Vision-Language Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  A. Vedaldi,et al.  Open-Set Recognition: A Good Closed-Set Classifier is All You Need , 2021, ICLR.

[11]  Chen Change Loy,et al.  Learning to Prompt for Vision-Language Models , 2021, International Journal of Computer Vision.

[12]  Yonghong Tian,et al.  Adversarial Reciprocal Points Learning for Open Set Recognition , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Fei Yin,et al.  Convolutional Prototype Network for Open Set Recognition , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Yanning Zhang,et al.  Class-Aware Visual Prompt Tuning for Vision-Language Pre-Trained Model , 2022, ArXiv.

[15]  Junnan Li,et al.  Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , 2021, NeurIPS.

[16]  Stanislav Fort,et al.  Exploring the Limits of Out-of-Distribution Detection , 2021, NeurIPS.

[17]  Rui Huang,et al.  MOS: Towards Scaling Out-of-distribution Detection for Large Semantic Space , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Alessandro Sperduti,et al.  Conditional Variational Capsule Network for Open Set Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Brian Lester,et al.  The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[20]  Shu Kong,et al.  OpenGAN: Open-Set Recognition via Open Data Generation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  De-Chuan Zhan,et al.  Learning Placeholders for Open-Set Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xian-Sheng Hua,et al.  Counterfactual Zero-Shot and Open-Set Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[24]  Quoc V. Le,et al.  Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[25]  Niko Sünderhauf,et al.  Class Anchor Clustering: A Loss for Distance-based Open Set Recognition , 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[27]  D. Song,et al.  The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Dawn Song,et al.  Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Percy Liang,et al.  Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[30]  Shiliang Pu,et al.  Learning Open Set Network with Discriminative Reciprocal Points , 2020, ECCV.

[31]  Ang Li,et al.  Hybrid Models for Open Set Recognition , 2020, ECCV.

[32]  Xin Sun,et al.  Conditional Gaussian Distribution Learning for Open Set Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  S. Gelly,et al.  Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.

[34]  Frank F. Xu,et al.  How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.

[35]  Ulli Waltinger,et al.  BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA , 2019, ArXiv.

[36]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[37]  Eric P. Xing,et al.  Learning Robust Global Representations by Penalizing Local Predictive Power , 2019, NeurIPS.

[38]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Vishal M. Patel,et al.  C2AE: Class Conditioned Auto-Encoder for Open-Set Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[41]  Takeshi Naemura,et al.  Classification-Reconstruction Learning for Open-Set Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[43]  Weng-Keen Wong,et al.  Open Set Learning with Counterfactual Images , 2018, ECCV.

[44]  Terrance E. Boult,et al.  The Extreme Value Machine , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[46]  Ricardo da Silva Torres,et al.  Nearest neighbors distance ratio open-set classifier , 2016, Machine Learning.

[47]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[49]  Terrance E. Boult,et al.  Towards Open World Recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[51]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[52]  Anderson Rocha,et al.  Toward Open Set Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[54]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.