暂无分享,去创建一个
Xiaojie Jin | Qibin Hou | Yujun Shi | Daquan Zhou | Zihang Jiang | Jiashi Feng | Yuan Li | Bingyi Kang | Weihao Yu | Xiaojie Jin | Bingyi Kang | Jiashi Feng | Qibin Hou | Yuan Li | Daquan Zhou | Weihao Yu | Zihang Jiang | Yujun Shi
[1] Shuicheng Yan,et al. Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.
[2] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[4] Xiaogang Wang,et al. Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.
[6] Chunhua Shen,et al. End-to-End Video Instance Segmentation with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Irwan Bello. LambdaNetworks: Modeling Long-Range Interactions Without Attention , 2021, ICLR.
[8] Enhua Wu,et al. Transformer in Transformer , 2021, NeurIPS.
[9] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[10] Xiaojie Jin,et al. Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet , 2021, ArXiv.
[11] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[12] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[13] Baining Guo,et al. Learning Texture Transformer Network for Image Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.
[15] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[16] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[17] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.
[18] Yaowei Wang,et al. Conformer: Local Features Coupling Global Representations for Visual Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[19] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[20] Eneko Agirre,et al. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.
[21] Jiashi Feng,et al. Revisit Knowledge Distillation: a Teacher-free Framework , 2019, ArXiv.
[22] Xuanjing Huang,et al. Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.
[23] Luowei Zhou,et al. End-to-End Dense Video Captioning with Masked Transformer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[24] Xiaohua Zhai,et al. Are we done with ImageNet? , 2020, ArXiv.
[25] Matthieu Cord,et al. Going deeper with Image Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] Geoffrey E. Hinton,et al. Similarity of Neural Network Representations Revisited , 2019, ICML.
[27] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[28] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[29] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[30] K. Simonyan,et al. High-Performance Large-Scale Image Recognition Without Normalization , 2021, ICML.
[31] Junying Chen,et al. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Yi Tay,et al. Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.
[33] Hongbo Zhang,et al. Quora Question Pairs , 2017 .
[34] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.
[35] Hongyang Chao,et al. Learning Joint Spatial-Temporal Transformations for Video Inpainting , 2020, ECCV.
[36] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[37] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[38] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[39] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[40] Raquel Urtasun,et al. Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.
[41] Wen Gao,et al. Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[43] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[44] Luc Van Gool,et al. LocalViT: Bringing Locality to Vision Transformers , 2021, ArXiv.
[45] Dilin Wang,et al. Improve Vision Transformers Training by Suppressing Over-smoothing , 2021, ArXiv.
[46] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[48] Ralph R. Martin,et al. PCT: Point cloud transformer , 2020, Computational Visual Media.
[49] Jiashi Feng,et al. Neural Epitome Search for Architecture-Agnostic Network Compression , 2020, ICLR.
[50] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
[51] Pieter Abbeel,et al. Bottleneck Transformers for Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Xiaogang Wang,et al. End-to-End Object Detection with Adaptive Clustering Transformer , 2020, BMVC.
[53] Ido Dagan,et al. The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.
[54] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[56] Roy Bar-Haim,et al. The Second PASCAL Recognising Textual Entailment Challenge , 2006 .
[57] Xiaojie Jin,et al. DeepViT: Towards Deeper Vision Transformer , 2021, ArXiv.
[58] Quanfu Fan,et al. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[59] Levent Sagun,et al. ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases , 2021, ICML.
[60] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[61] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[62] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[63] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[64] Andreas Loukas,et al. Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth , 2021, ICML.
[65] Honglak Lee,et al. Learning Invariant Representations with Local Transformations , 2012, ICML.