fairseq: A Fast, Extensible Toolkit for Sequence Modeling

fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs. A demo video can be found at https://www.youtube.com/watch?v=OtgDdWtHvto

[1]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[2]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[3]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[4]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[5]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[6]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[7]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.

[8]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[11]  Satoshi Nakamura,et al.  An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation , 2017, NMT@ACL.

[12]  Richard Socher,et al.  Weighted Transformer Network for Machine Translation , 2017, ArXiv.

[13]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[14]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[15]  Jason Weston,et al.  ParlAI: A Dialog Research Software Platform , 2017, EMNLP.

[16]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[17]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[18]  Moustapha Cissé,et al.  Efficient softmax approximation for GPUs , 2016, ICML.

[19]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[20]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[21]  Michael J. Denkowski,et al.  Sockeye: A Toolkit for Neural Machine Translation , 2017, ArXiv.

[22]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[23]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[24]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[25]  André F. T. Martins,et al.  Marian: Fast Neural Machine Translation in C++ , 2018, ACL.

[26]  Mirella Lapata,et al.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[27]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[28]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[29]  Noam Shazeer,et al.  Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.

[30]  Samy Bengio,et al.  Tensor2Tensor for Neural Machine Translation , 2018, AMTA.

[31]  Richard Socher,et al.  An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.

[32]  Peter Dayan,et al.  Fast Parametric Learning with Activation Memorization , 2018, ICML.

[33]  H. Ng,et al.  A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction , 2018, AAAI.

[34]  Alexander M. Rush,et al.  Bottom-Up Abstractive Summarization , 2018, EMNLP.

[35]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[36]  Boris Ginsburg,et al.  OpenSeq2Seq: Extensible Toolkit for Distributed and Mixed Precision Training of Sequence-to-Sequence Models , 2018, ArXiv.

[37]  Kenny Q. Zhu,et al.  Controlling Length in Abstractive Summarization Using a Convolutional Neural Network , 2018, EMNLP.

[38]  Angela Fan,et al.  Controllable Abstractive Summarization , 2017, NMT@ACL.

[39]  Di He,et al.  Double Path Networks for Sequence to Sequence Learning , 2018, COLING.

[40]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[41]  Marc'Aurelio Ranzato,et al.  Analyzing Uncertainty in Neural Machine Translation , 2018, ICML.

[42]  Myle Ott,et al.  Scaling Neural Machine Translation , 2018, WMT.

[43]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[44]  Yun Chen,et al.  A Stable and Effective Learning Strategy for Trainable Greedy Decoding , 2018, EMNLP.

[45]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[46]  Yann Dauphin,et al.  Strategies for Structuring Story Generation , 2019, ACL.

[47]  Marc'Aurelio Ranzato,et al.  Mixture Models for Diverse Machine Translation: Tricks of the Trade , 2019, ICML.

[48]  Jason Weston,et al.  Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.

[49]  Alexei Baevski,et al.  Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.

[50]  Yann Dauphin,et al.  Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.