暂无分享,去创建一个
Shih-Fu Chang | Derek Hao Hu | Alireza Zareian | Kevin Dela Rosa | Shih-Fu Chang | D. Hu | Alireza Zareian
[1] Venkatesh Saligrama,et al. Don’t Even Look Once: Synthesizing Features for Zero-Shot Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Nick Barnes,et al. Improved Visual-Semantic Alignment for Zero-Shot Object Detection , 2020, AAAI.
[3] Shih-Fu Chang,et al. Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions , 2020, ArXiv.
[4] Shafin Rahman,et al. Transductive Learning for Zero-Shot Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[5] Wei Li,et al. Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[6] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[7] Chang Liu,et al. C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Justin Johnson,et al. VirTex: Learning Visual Representations from Textual Annotations , 2020, ArXiv.
[9] Alexander M. Bronstein,et al. Learning to Detect and Retrieve Objects From Unlabeled Videos , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[10] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[11] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[12] Cordelia Schmid,et al. Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[13] Wei Li,et al. Learning to discover and localize visual objects with open vocabulary , 2018, ArXiv.
[14] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[15] Venkatesh Saligrama,et al. Zero Shot Detection , 2018, IEEE Transactions on Circuits and Systems for Video Technology.
[16] Andrea Vedaldi,et al. Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Martial Hebert,et al. Model recommendation: Generating object detectors from few samples , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Yong Jae Lee,et al. Weakly-Supervised Visual Grounding of Phrases with Linguistic Structures , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[20] Rama Chellappa,et al. Zero-Shot Object Detection , 2018, ECCV.
[21] Shih-Fu Chang,et al. Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[23] Vittorio Ferrari,et al. Revisiting Knowledge Transfer for Training Object Class Detectors , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[24] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[25] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[26] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[28] C. V. Jawahar,et al. A Multi-Space Approach to Zero-Shot Object Detection , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[29] Ramakant Nevatia,et al. Knowledge Aided Consistency for Weakly Supervised Phrase Grounding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[30] Rui Wang,et al. DLWL: Improving Detection for Lowshot Classes With Weakly Labelled Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[32] Jordi Pont-Tuset,et al. The Open Images Dataset V4 , 2018, International Journal of Computer Vision.
[33] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[34] Lina Yao,et al. Zero-Shot Object Detection with Textual Descriptions , 2019, AAAI.
[35] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[36] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.
[37] Ramakant Nevatia,et al. Automatic Concept Discovery from Parallel Text and Visual Corpora , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[38] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[39] Trevor Darrell,et al. LSDA: Large Scale Detection through Adaptation , 2014, NIPS.
[40] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[41] Ramakant Nevatia,et al. NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[42] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[43] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[44] Yuxing Tang,et al. Large Scale Semi-Supervised Object Detection Using Visual and Semantic Knowledge Transfer , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Ajay Divakaran,et al. Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[46] Jianlong Fu,et al. Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers , 2020, ArXiv.