暂无分享,去创建一个
Tobias Höllerer | Pradeep Sen | Noah Stier | Yi Ding | Alex Rich | Mason Wang | Matthew A. Turk | M. Turk | P. Sen | Yi Ding | Tobias Höllerer | Noah Stier | Alex Rich | Mason Wang
[1] Ioannis Mitliagkas,et al. Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.
[2] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[3] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[4] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[5] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] B.P. Yuhas,et al. Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.
[7] Andrew Zisserman,et al. Vggsound: A Large-Scale Audio-Visual Dataset , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[9] Ruslan Salakhutdinov,et al. Multimodal Transformer for Unaligned Multimodal Language Sequences , 2019, ACL.
[10] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Erik Cambria,et al. Memory Fusion Network for Multi-view Sequential Learning , 2018, AAAI.
[12] Zeyi Huang,et al. Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition , 2021, NeurIPS.
[13] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[14] Olatunji Ruwase,et al. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters , 2020, KDD.
[15] Ramesh Nallapati,et al. Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering , 2019, EMNLP.
[16] Angela Dai,et al. TransformerFusion: Monocular RGB Scene Reconstruction using Transformers , 2021, NeurIPS.
[17] Erik Cambria,et al. Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph , 2018, ACL.
[18] David Berthelot,et al. MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.
[19] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[20] Andrew Zisserman,et al. Perceiver: General Perception with Iterative Attention , 2021, ICML.
[21] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[22] Jianfei Cai,et al. Scalable Vision Transformers with Hierarchical Pooling , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Cordelia Schmid,et al. Attention Bottlenecks for Multimodal Fusion , 2021, ArXiv.
[24] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[25] Olatunji Ruwase,et al. ZeRO-Offload: Democratizing Billion-Scale Model Training , 2021, USENIX ATC.
[26] Yi Ding,et al. Augmentation Strategies for Learning with Noisy Labels , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[28] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.
[29] Louis-Philippe Morency,et al. Integrating Multimodal Information in Large Pretrained Transformers , 2020, ACL.
[30] Louis-Philippe Morency,et al. Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors , 2018, AAAI.
[31] Minjia Zhang,et al. Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping , 2020, NeurIPS.
[32] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Anamitra R. Choudhury,et al. PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination , 2020, ICML.
[34] Zheng Zhang,et al. BP-Transformer: Modelling Long-Range Context via Binary Partitioning , 2019, ArXiv.
[35] Tobias Höllerer,et al. VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion , 2021, 2021 International Conference on 3D Vision (3DV).