暂无分享,去创建一个
Geoffrey E. Hinton | Quoc V. Le | Azalia Mirhoseini | Noam Shazeer | Jeff Dean | Andy Davis | Krzysztof Maziarz | Noam Shazeer | A. Mirhoseini | Krzysztof Maziarz | Andy Davis | J. Dean | Azalia Mirhoseini | Noam M. Shazeer
[1] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[2] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.
[3] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[4] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[5] Jürgen Schmidhuber,et al. Learning to forget: continual prediction with LSTM , 1999 .
[6] Volker Tresp,et al. Mixtures of Gaussian Processes , 2000, NIPS.
[7] Carl E. Rasmussen,et al. Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.
[8] Samy Bengio,et al. A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.
[9] Fei-Fei Li,et al. Hierarchical Mixture of Classification Experts Uncovers Interactions between Brain Regions , 2009, NIPS.
[10] Babak Shahbaba,et al. Nonlinear Models Using Dirichlet Process Mixtures , 2007, J. Mach. Learn. Res..
[11] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[12] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[13] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[14] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[16] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[17] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[18] Marc'Aurelio Ranzato,et al. Learning Factored Representations in a Deep Mixture of Experts , 2013, ICLR.
[19] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[20] Nadir Durrani,et al. Edinburgh’s Phrase-based Machine Translation Systems for WMT-14 , 2014, WMT@ACL.
[21] Yoshua Bengio,et al. Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning , 2014, ArXiv.
[22] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[23] Itamar Arel,et al. Low-Rank Approximations for Conditional Feedforward Computation in Deep Neural Networks , 2013, ICLR.
[24] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[25] Quoc V. Le,et al. Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.
[26] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[27] Deep Sequential Neural Networks Deep Sequential Neural Networks , 2015 .
[28] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[30] Marc Peter Deisenroth,et al. Distributed Gaussian Processes , 2015, ICML.
[31] Matthias Bethge,et al. Generative Image Modeling Using Spatial LSTMs , 2015, NIPS.
[32] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[33] Joelle Pineau,et al. Conditional Computation in Neural Networks for faster models , 2015, ArXiv.
[34] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[36] Hugo Larochelle,et al. Dynamic Capacity Networks , 2015, ICML.
[37] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[38] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .
[39] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[40] Christof Monz,et al. Ensemble Learning for Multi-Source Neural Machine Translation , 2016, COLING.
[41] Wei Xu,et al. Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation , 2016, TACL.
[42] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[43] Alex Graves,et al. Memory-Efficient Backpropagation Through Time , 2016, NIPS.
[44] Tinne Tuytelaars,et al. Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.