暂无分享,去创建一个
Zangwei Zheng | Yang You | Fuzhao Xue | Yuxuan Lou | Yang You | Fuzhao Xue | Zangwei Zheng | Yuxuan Lou
[1] Ron Meir,et al. Time Series Prediction using Mixtures of Experts , 1996, NIPS.
[2] James Demmel,et al. Reducing BERT Pre-Training Time from 3 Days to 76 Minutes , 2019, ArXiv.
[3] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[4] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[5] Carlos Riquelme,et al. Scaling Vision with Sparse Mixture of Experts , 2021, NeurIPS.
[6] Yang You,et al. Go Wider Instead of Deeper , 2021, AAAI.
[7] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.
[8] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[9] W.J. Tompkins,et al. A patient-adaptable ECG beat classifier using a mixture of experts approach , 1997, IEEE Transactions on Biomedical Engineering.
[10] Shuicheng Yan,et al. Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[12] Saining Xie,et al. An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Alexander Kolesnikov,et al. MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.
[14] Kaiming He,et al. Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.
[15] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Michal Valko,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.
[17] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.
[18] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[19] Ke Chen,et al. Improved learning algorithms for mixture of experts in multiclass classification , 1999, Neural Networks.
[20] Joshua Ainslie,et al. FNet: Mixing Tokens with Fourier Transforms , 2021, NAACL.
[21] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[22] Stefano Nolfi,et al. Learning to perceive the world as articulated: an approach for hierarchical learning in sensory-motor systems , 1998, Neural Networks.
[23] Dariu Gavrila,et al. Multi-cue Pedestrian Detection and Tracking from a Moving Vehicle , 2007, International Journal of Computer Vision.
[24] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[25] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] Quoc V. Le,et al. Pay Attention to MLPs , 2021, NeurIPS.
[27] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[28] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[29] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[30] Dahua Lin,et al. Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination , 2018, ArXiv.
[31] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Ronald N. Bracewell,et al. The Fourier Transform and Its Applications , 1966 .
[34] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[35] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[36] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[37] Quoc V. Le,et al. RandAugment: Practical data augmentation with no separate search , 2019, ArXiv.
[38] Kevin Gimpel,et al. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.
[39] Dimitris N. Metaxas,et al. Learning to Reconstruct 3 D Human Motion from Bayesian Mixtures of Experts . A Probabilistic Discriminative Approach , 2004 .
[40] Joseph N. Wilson,et al. Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.
[41] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[42] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[43] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.