暂无分享,去创建一个
Dong Yu | Dan Su | Zhao You | Shulin Feng | Zhao You | Dong Yu | Dan Su | Shulin Feng
[1] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[2] Beng Chin Ooi,et al. Dynamic Routing Networks , 2019, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[3] Philip C. Woodland,et al. Very deep convolutional neural networks for robust speech recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[4] Shiliang Zhang,et al. Deep-FSMN for Large Vocabulary Continuous Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[6] Navdeep Jaitly,et al. Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[7] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.
[8] Parisa Haghani,et al. Syllable-based acoustic modeling with CTC-SMBR-LSTM , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[9] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[10] Yoshua Bengio,et al. Light Gated Recurrent Units for Speech Recognition , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.
[11] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[12] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[13] Trevor Darrell,et al. Deep Mixture of Experts via Shallow Embedding , 2018, UAI.
[14] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[15] Petr Motlícek,et al. Learning feature mapping using deep neural network bottleneck features for distant large vocabulary speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Marc'Aurelio Ranzato,et al. Hard Mixtures of Experts for Large Scale Weakly Supervised Vision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[18] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[19] Frank Zhang,et al. Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition , 2020, ArXiv.
[20] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[21] Dong Yu,et al. Recent progresses in deep learning based acoustic models , 2017, IEEE/CAA Journal of Automatica Sinica.
[22] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[23] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[24] Lorenzo Torresani,et al. Network of Experts for Large-Scale Image Categorization , 2016, ECCV.
[25] Chao Weng,et al. Dfsmn-San with Persistent Memory Model for Automatic Speech Recognition , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).