LegoNN: Building Modular Encoder-Decoder Models
暂无分享,去创建一个
M. Lewis | Luke Zettlemoyer | Abdel-rahman Mohamed | Florian Metze | Sergey Edunov | Shinji Watanabe | Dmytro Okhonko | Siddharth Dalmia
[1] Mohammad Norouzi,et al. Non-Autoregressive Machine Translation with Latent Alignments , 2020, EMNLP.
[2] Omer Levy,et al. Aligned Cross Entropy for Non-Autoregressive Machine Translation , 2020, ICML.
[3] Geoffrey E. Hinton,et al. Imputer: Sequence Modelling via Imputation and Dynamic Programming , 2020, ICML.
[4] Brian Kingsbury,et al. Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300 , 2020, INTERSPEECH.
[5] D. Roth,et al. Neural Module Networks for Reasoning over Text , 2019, ICLR.
[6] A. Sanchís,et al. Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[8] Marc'Aurelio Ranzato,et al. Task-Driven Modular Networks for Zero-Shot Compositional Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Luke S. Zettlemoyer,et al. Transformers with convolutional context for ASR , 2019, ArXiv.
[10] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[11] Omer Levy,et al. Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.
[12] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[13] Alexei A. Efros,et al. Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity , 2019, NeurIPS.
[14] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[15] Jindrich Libovický,et al. End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification , 2018, EMNLP.
[16] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[17] Carolyn Penstein Rosé,et al. Stress Test Evaluation for Natural Language Inference , 2018, COLING.
[18] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[19] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[20] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[21] Jason Lee,et al. Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.
[22] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[23] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[24] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.
[25] Sergey Levine,et al. Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[26] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Yiming Wang,et al. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.
[28] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[29] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[31] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.
[32] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[34] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[36] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[39] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[40] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[41] Paul Deléglise,et al. Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks , 2014, LREC.
[42] Andrew W. Senior,et al. Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.
[43] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.
[44] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[45] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.
[46] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[47] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.
[48] Kim B. Clark,et al. Design Rules: The Power of Modularity , 2000 .
[49] Yoshua Bengio,et al. Global training of document processing systems using graph transformer networks , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[50] Frederick Jelinek,et al. Statistical methods for speech recognition , 1997 .
[51] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.
[52] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.