暂无分享,去创建一个
Matthieu Cord | Gabriel Synnaeve | Alexandre Sablayrolles | Hugo Touvron | Herv'e J'egou | Gabriel Synnaeve | Hugo Touvron | Alexandre Sablayrolles | M. Cord | Herv'e J'egou
[1] Samuel L. Smith,et al. Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks , 2020, NeurIPS.
[2] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[3] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[4] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[5] Quoc V. Le,et al. Meta Pseudo Labels , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Gabriel Synnaeve,et al. Self-Training and Pre-Training are Complementary for Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Ivan Laptev,et al. Training Vision Transformers for Image Retrieval , 2021, ArXiv.
[8] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[9] Torsten Hoefler,et al. Augment Your Batch: Improving Generalization Through Instance Repetition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[11] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[12] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[13] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[14] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[15] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[16] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[17] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Matthijs Douze,et al. Fixing the train-test resolution discrepancy: FixEfficientNet , 2020, ArXiv.
[19] Kurt Keutzer,et al. Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.
[20] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.
[21] Garrison W. Cottrell,et al. ReZero is All You Need: Fast Convergence at Large Depth , 2020, UAI.
[22] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[23] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[24] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[25] Edouard Grave,et al. End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures , 2019, ArXiv.
[26] Victor O. K. Li,et al. Non-Autoregressive Neural Machine Translation , 2017, ICLR.
[27] Maksims Volkovs,et al. Improving Transformer Optimization Through Better Initialization , 2020, ICML.
[28] Jürgen Schmidhuber,et al. Training Very Deep Networks , 2015, NIPS.
[29] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[30] Quoc V. Le,et al. Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[31] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.
[32] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[33] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[34] Jian Yang,et al. Selective Kernel Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[37] Xiaohua Zhai,et al. Are we done with ImageNet? , 2020, ArXiv.
[38] Jascha Sohl-Dickstein,et al. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.
[39] Tengyu Ma,et al. Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.
[40] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[41] Quoc V. Le,et al. Adversarial Examples Improve Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.
[43] Dustin Tran,et al. Image Transformer , 2018, ICML.
[44] Jean-Michel Morel,et al. A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[45] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[46] Vladlen Koltun,et al. Exploring Self-Attention for Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Pieter Abbeel,et al. Bottleneck Transformers for Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[49] Quoc V. Le,et al. Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition , 2020, ArXiv.
[50] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.
[51] Lu Yuan,et al. Dynamic Convolution: Attention Over Convolution Kernels , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[53] Jason Lee,et al. Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.
[54] Iasonas Kokkinos,et al. MultiGrain: a unified image embedding for classes and instances , 2019, ArXiv.
[55] Chongruo Wu,et al. ResNeSt: Split-Attention Networks , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[56] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.
[57] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Enhua Wu,et al. Transformer in Transformer , 2021, NeurIPS.
[59] Samuel L. Smith,et al. Characterizing signal propagation to close the performance gap in unnormalized ResNets , 2021, ICLR.
[60] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[61] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[62] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[63] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.
[65] Edouard Grave,et al. Training with Quantization Noise for Extreme Model Compression , 2020, ICLR.
[66] Irwan Bello. LambdaNetworks: Modeling Long-Range Interactions Without Attention , 2021, ICLR.
[67] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.
[68] K. Simonyan,et al. High-Performance Large-Scale Image Recognition Without Normalization , 2021, ICML.
[69] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[70] Hermann Ney,et al. RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation , 2019, INTERSPEECH.
[71] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[72] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[73] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[74] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.