暂无分享,去创建一个
Ankur Bapna | Yuan Cao | Yonghui Wu | Orhan Firat | Mia Xu Chen | Orhan Firat | Yonghui Wu | Ankur Bapna | M. Chen | Yuan Cao
[1] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[2] Jacob Devlin,et al. Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU , 2017, EMNLP.
[3] A. Emin Orhan,et al. Skip Connections as Effective Symmetry-Breaking , 2017, ArXiv.
[4] Wei Xu,et al. Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation , 2016, TACL.
[5] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[7] Jürgen Schmidhuber,et al. Highway Networks , 2015, ArXiv.
[8] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[9] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[10] Jascha Sohl-Dickstein,et al. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Understanding and Improvement , 2017, ArXiv.
[11] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[12] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[13] Gang Zeng,et al. Weighted residuals for very deep networks , 2016, 2016 3rd International Conference on Systems and Informatics (ICSAI).
[14] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[15] Tomaso Poggio,et al. Learning Functions: When Is Deep Better Than Shallow , 2016, 1603.00988.
[16] Rico Sennrich,et al. Deep architectures for Neural Machine Translation , 2017, WMT.
[17] Dawn Xiaodong Song,et al. Gradients explode - Deep Networks are shallow - ResNet explained , 2017, ICLR.
[18] Quoc V. Le,et al. Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.
[19] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[20] Jascha Sohl-Dickstein,et al. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.
[21] Qun Liu,et al. Deep Neural Machine Translation with Linear Associative Unit , 2017, ACL.
[22] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[23] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[24] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Razvan Pascanu,et al. Understanding the exploding gradient problem , 2012, ArXiv.
[26] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Ohad Shamir,et al. The Power of Depth for Feedforward Neural Networks , 2015, COLT.
[28] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[29] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.
[30] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.