暂无分享,去创建一个
Zhe Gan | Jingjing Liu | Shuohang Wang | Tianlong Chen | Yu Cheng | Yen-Chun Chen | Linjie Li | Zhe Gan | Shuohang Wang | Jingjing Liu | Yu Cheng | Yen-Chun Chen | Linjie Li | Tianlong Chen
[1] Erich Elsen,et al. Fast Sparse ConvNets , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Zhangyang Wang,et al. A Unified Lottery Ticket Hypothesis for Graph Neural Networks , 2021, ICML.
[3] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[4] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[5] Jianlong Fu,et al. Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training , 2021, NeurIPS.
[6] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[7] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[8] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[9] Asim Kadav,et al. Visual Entailment: A Novel Task for Fine-Grained Image Understanding , 2019, ArXiv.
[10] Erich Elsen,et al. The Difficulty of Training Sparse Neural Networks , 2019, ArXiv.
[11] Junnan Li,et al. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , 2021, NeurIPS.
[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[13] Soheil Feizi,et al. Winning Lottery Tickets in Deep Generative Models , 2021, AAAI.
[14] Marcus Rohrbach,et al. 12-in-1: Multi-Task Vision and Language Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Hao Tian,et al. ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph , 2020, AAAI.
[16] Yoav Artzi,et al. A Corpus for Reasoning about Natural Language Grounded in Photographs , 2018, ACL.
[17] Yuandong Tian,et al. Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP , 2019, ICLR.
[18] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.
[19] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[21] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[22] Jiebo Luo,et al. TAP: Text-Aware Pre-training for Text-VQA and Text-Caption , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[24] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[25] Michael Carbin,et al. Comparing Rewinding and Fine-tuning in Neural Network Pruning , 2019, ICLR.
[27] Anna Rumshisky,et al. When BERT Plays the Lottery, All Tickets Are Winning , 2020, EMNLP.
[28] Jianfeng Gao,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2020, AAAI.
[29] Jianfeng Gao,et al. Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Gilad Yehudai,et al. Proving the Lottery Ticket Hypothesis: Pruning is All You Need , 2020, ICML.
[31] Zhe Gan,et al. HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training , 2020, EMNLP.
[32] Jaemin Cho,et al. Unifying Vision-and-Language Tasks via Text Generation , 2021, ICML.
[33] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[34] Jianfeng Gao,et al. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization , 2019, ACL.
[35] M. Maire,et al. Winning the Lottery with Continuous Sparsification , 2019, NeurIPS.
[36] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[37] Zhe Gan,et al. Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[39] Tom Goldstein,et al. FreeLB: Enhanced Adversarial Training for Language Understanding , 2019, ICLR 2020.
[40] Haoyu Ma,et al. Good Students Play Big Lottery Better , 2021, ArXiv.
[41] Abhishek Das,et al. Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline , 2020, ECCV.
[42] Ilya Sutskever,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[43] Shih-Fu Chang,et al. Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions , 2020, ArXiv.
[44] Zhe Gan,et al. An Empirical Study of Training End-to-End Vision-and-Language Transformers , 2021, ArXiv.
[45] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[46] Yue Wang,et al. Drawing early-bird tickets: Towards more efficient training of deep networks , 2019, ICLR.
[47] Zhangyang Wang,et al. Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning , 2021, ICLR.
[48] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[49] Yu Cheng,et al. Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models , 2020, ECCV.
[50] Zhe Gan,et al. Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly , 2021, ArXiv.
[51] Hao Chen,et al. The Lottery Ticket Hypothesis for Object Recognition , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[52] David J. Schwab,et al. The Early Phase of Neural Network Training , 2020, ICLR.
[53] Lijuan Wang,et al. Compressing Visual-linguistic Model via Knowledge Distillation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[54] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[55] Zhe Gan,et al. A Closer Look at the Robustness of Vision-and-Language Pre-trained Models , 2020, ArXiv.
[56] Jianlong Fu,et al. Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Ankit Pensia,et al. Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient , 2020, NeurIPS.
[58] Ali Farhadi,et al. From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Shiyu Chang,et al. The Lottery Ticket Hypothesis for Pre-trained BERT Networks , 2020, NeurIPS.
[60] Peng Gao,et al. Contrastive Visual-Linguistic Pretraining , 2020, ArXiv.
[61] Jianfeng Gao,et al. VIVO: Surpassing Human Performance in Novel Object Captioning with Visual Vocabulary Pre-Training , 2020, ArXiv.
[62] Tianlong Chen,et al. GANs Can Play Lottery Tickets Too , 2021, ICLR.
[63] Furu Wei,et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.
[64] Yu Cheng,et al. Large-Scale Adversarial Training for Vision-and-Language Representation Learning , 2020, NeurIPS.
[65] Yi Yang,et al. ActBERT: Learning Global-Local Video-Text Representations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[66] Cho-Jui Hsieh,et al. What Does BERT with Vision Look At? , 2020, ACL.
[67] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[68] Roger B. Grosse,et al. Picking Winning Tickets Before Training by Preserving Gradient Flow , 2020, ICLR.
[69] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[70] Gintare Karolina Dziugaite,et al. Linear Mode Connectivity and the Lottery Ticket Hypothesis , 2019, ICML.
[71] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.
[72] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[73] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[74] Arjun Majumdar,et al. Improving Vision-and-Language Navigation with Image-Text Pairs from the Web , 2020, ECCV.
[75] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[76] Shiyu Chang,et al. The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[77] Christopher D. Manning,et al. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[78] Lei Zhang,et al. VinVL: Making Visual Representations Matter in Vision-Language Models , 2021, ArXiv.
[79] Ildoo Kim,et al. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision , 2021, ICML.
[80] Zhe Gan,et al. EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets , 2020, ACL.
[81] Cordelia Schmid,et al. Contrastive Bidirectional Transformer for Temporal Representation Learning , 2019, ArXiv.
[82] Erich Elsen,et al. The State of Sparsity in Deep Neural Networks , 2019, ArXiv.
[83] Yuankai Qi,et al. A Recurrent Vision-and-Language BERT for Navigation , 2020, ArXiv.
[84] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[85] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[86] Jianlong Fu,et al. Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers , 2020, ArXiv.