The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks (Hochreiter and Schmidhuber, 1997) are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as bioinformatics (Min et al., 2016). Recent advances in contextual word embeddings like BERT (Devlin et al., 2019) boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand. At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements that tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This thesis can be understood as an antithesis to the prevailing paradigm. We show how traditional symbolic statistical machine translation (Koehn, 2009) models can still improve neural machine translation (Kalchbrenner and Blunsom, 2013; Sutskever et al., 2014; Bahdanau et al., 2015, NMT) while reducing the risk of common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural models to correct grammatical errors in text.

[1]  Gonzalo Iglesias,et al.  Transducer Disambiguation with Sparse Topological Features , 2015, EMNLP.

[2]  Ming Zhou,et al.  Bilingual Data Cleaning for SMT using Graph-based Random Walk , 2013, ACL.

[3]  Marcin Junczys-Dowmunt,et al.  Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions , 2016, IWSLT.

[4]  Hwee Tou Ng,et al.  The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[5]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[6]  Mike Schuster,et al.  Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Tomoki Toda,et al.  Ckylark: A More Robust PCFG-LA Parser , 2015, HLT-NAACL.

[8]  Shi Feng,et al.  Pathologies of Neural Models Make Interpretations Difficult , 2018, EMNLP.

[9]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[10]  Wei Wu,et al.  Phrase-level Self-Attention Networks for Universal Sentence Encoding , 2018, EMNLP.

[11]  H. Ng,et al.  A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction , 2018, AAAI.

[12]  Yong Cheng,et al.  Neural Machine Translation with Key-Value Memory-Augmented Attention , 2018, IJCAI.

[13]  Adrià de Gispert,et al.  Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices , 2016, EACL.

[14]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models , 2005, HLT.

[15]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[16]  Marc'Aurelio Ranzato,et al.  Analyzing Uncertainty in Neural Machine Translation , 2018, ICML.

[17]  Holger Schwenk,et al.  Investigations on large-scale lightly-supervised training for statistical machine translation. , 2008, IWSLT.

[18]  Jingbo Zhu,et al.  Towards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation , 2017, EMNLP.

[19]  Jacob Devlin,et al.  Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU , 2017, EMNLP.

[20]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[21]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[22]  Joelle Pineau,et al.  Language GANs Falling Short , 2018, ICLR.

[23]  Angela Fan,et al.  Controllable Abstractive Summarization , 2017, NMT@ACL.

[24]  Kevin Duh,et al.  On the Elements of an Accurate Tree-to-String Machine Translation System , 2014, ACL.

[25]  Rico Sennrich,et al.  Regularization techniques for fine-tuning in neural machine translation , 2017, EMNLP.

[26]  Graham Neubig,et al.  A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models , 2017, AAAI.

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  Holger Schwenk,et al.  N-gram-based machine translation enhanced with neural networks , 2010, IWSLT.

[29]  Yang Liu,et al.  Recursive Autoencoders for ITG-Based Translation , 2013, EMNLP.

[30]  Rico Sennrich,et al.  A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation , 2018, WMT.

[31]  Mingbo Ma,et al.  When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size) , 2017, EMNLP.

[32]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[33]  Ondrej Bojar,et al.  Training Tips for the Transformer Model , 2018, Prague Bull. Math. Linguistics.

[34]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[35]  Stephan Vogel,et al.  The QCRI Recognition System for Handwritten Arabic , 2015, ICIAP.

[36]  Shahram Khadivi,et al.  Parallel Corpus Refinement as an Outlier Detection Algorithm , 2011, MTSUMMIT.

[37]  Petr Motlícek,et al.  Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition , 2012, INTERSPEECH.

[38]  Xing Li,et al.  STACL: Simultaneous Translation with Integrated Anticipation and Controllable Latency , 2018, ArXiv.

[39]  Dan Klein,et al.  When and why are log-linear models self-normalizing? , 2015, NAACL.

[40]  Di He,et al.  Non-Autoregressive Machine Translation with Auxiliary Regularization , 2019, AAAI.

[41]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[42]  Stephen Clark,et al.  Syntax-Based Word Ordering Incorporating a Large-Scale Language Model , 2012, EACL.

[43]  Shahram Khadivi,et al.  Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search , 2017, EMNLP.

[44]  Zhiguo Wang,et al.  Coverage Embedding Models for Neural Machine Translation , 2016, EMNLP.

[45]  Alexander M. Rush,et al.  Adapting Sequence Models for Sentence Correction , 2017, EMNLP.

[46]  Mark Fishel,et al.  Multi-Domain Neural Machine Translation , 2018, EAMT.

[47]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[48]  Min Zhang,et al.  Variational Neural Machine Translation , 2016, EMNLP.

[49]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[50]  Yongqiang Wang,et al.  Efficient lattice rescoring using recurrent neural network language models , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[51]  Diego Marcheggiani,et al.  Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks , 2018, NAACL.

[52]  Khalil Sima'an,et al.  A Shared Task on Multimodal Machine Translation and Crosslingual Image Description , 2016, WMT.

[53]  Toshiaki Nakazawa,et al.  ASPEC: Asian Scientific Paper Excerpt Corpus , 2016, LREC.

[54]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[55]  John DeNero,et al.  Variable-Length Word Encodings for Neural Translation Models , 2015, EMNLP.

[56]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[57]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[58]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[59]  Tommi S. Jaakkola,et al.  A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.

[60]  Ole Winther,et al.  Neural Machine Translation with Characters and Hierarchical Encoding , 2016, ArXiv.

[61]  Kevin Knight,et al.  Augmenting Statistical Machine Translation with Subword Translation of Out-of-Vocabulary Words , 2018, ArXiv.

[62]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[63]  Chengqi Zhang,et al.  Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling , 2018, IJCAI.

[64]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[65]  Tiejun Zhao,et al.  Forest-Based Neural Machine Translation , 2018, ACL.

[66]  Thomas Fang Zheng,et al.  Enhanced neural machine translation by learning from draft , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[67]  Pushpak Bhattacharyya,et al.  Learning variable length units for SMT between related languages via Byte Pair Encoding , 2016, SWCN@EMNLP.

[68]  Heike Adel,et al.  Exploring Different Dimensions of Attention for Uncertainty Detection , 2016, EACL.

[69]  Rongrong Ji,et al.  Asynchronous Bidirectional Decoding for Neural Machine Translation , 2018, AAAI.

[70]  W. Byrne,et al.  Generalization and maximum likelihood from small data sets , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[71]  Yonatan Belinkov,et al.  What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models , 2018, AAAI.

[72]  Desmond Elliott,et al.  Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description , 2017, WMT.

[73]  T. Kathirvalavakumar,et al.  Pruning algorithms of neural networks — a comparative study , 2013, Central European Journal of Computer Science.

[74]  Yue Zhang,et al.  Transition-Based Syntactic Linearization , 2015, NAACL.

[75]  Pushpak Bhattacharyya,et al.  Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages , 2018, NAACL.

[76]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[77]  Pontus Stenetorp,et al.  Transition-based Dependency Parsing Using Recursive Neural Networks , 2013 .

[78]  Ming Zhou,et al.  Reinforced Mnemonic Reader for Machine Reading Comprehension , 2017, IJCAI.

[79]  Graham Neubig,et al.  Learning to Translate in Real-time with Neural Machine Translation , 2016, EACL.

[80]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[81]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[82]  Philipp Koehn,et al.  An Analysis of Source Context Dependency in Neural Machine Translation , 2018, EAMT.

[83]  Veselin Stoyanov,et al.  Simple Fusion: Return of the Language Model , 2018, WMT.

[84]  Yang Liu,et al.  Joint Training for Pivot-based Neural Machine Translation , 2016, IJCAI.

[85]  Bowen Zhou,et al.  Pointing the Unknown Words , 2016, ACL.

[86]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[87]  Jörg Tiedemann,et al.  Neural machine translation for low-resource languages , 2017, ArXiv.

[88]  Yoav Goldberg,et al.  Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[89]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[90]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[91]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[92]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[93]  Kenneth Heafield,et al.  Fast Neural Machine Translation Implementation , 2018, NMT@ACL.

[94]  Nadir Durrani,et al.  A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[95]  Yue Zhang,et al.  Transition-Based Syntactic Linearization with Lookahead Features , 2016, HLT-NAACL.

[96]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[97]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[98]  Shankar Kumar,et al.  Local Phrase Reordering Models for Statistical Machine Translation , 2005, HLT.

[99]  Xu Sun,et al.  Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation , 2018, EMNLP.

[100]  Taro Watanabe,et al.  Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection , 2018, WMT.

[101]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[102]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[103]  Jan Niehues,et al.  Pre-Translation for Neural Machine Translation , 2016, COLING.

[104]  Samuel R. Bowman,et al.  Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.

[105]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[106]  Yann Dauphin,et al.  Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.

[107]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[108]  Jindrich Libovický,et al.  End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification , 2018, EMNLP.

[109]  Markus Freitag,et al.  Attention-based Vocabulary Selection for NMT Decoding , 2017, ArXiv.

[110]  Young-Kil Kim,et al.  Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages , 2017, LREC.

[111]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[112]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[113]  Deyi Xiong,et al.  A GRU-Gated Attention Model for Neural Machine Translation , 2017, ArXiv.

[114]  Yoshua Bengio,et al.  Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning , 2017, Rep4NLP@ACL.

[115]  Yonatan Belinkov,et al.  NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks , 2018, AAAI.

[116]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[117]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[118]  Jerome R. Bellegarda,et al.  A latent semantic analysis framework for large-Span language modeling , 1997, EUROSPEECH.

[119]  Yun Chen,et al.  A Stable and Effective Learning Strategy for Trainable Greedy Decoding , 2018, EMNLP.

[120]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[121]  Samy Bengio,et al.  Tensor2Tensor for Neural Machine Translation , 2018, AMTA.

[122]  David Chiang,et al.  Correcting Length Bias in Neural Machine Translation , 2018, WMT.

[123]  Marine Carpuat,et al.  Detecting Cross-Lingual Semantic Divergence for Neural Machine Translation , 2017, NMT@ACL.

[124]  Leonid Sigal,et al.  Middle-Out Decoding , 2018, NeurIPS.

[125]  Chenhui Chu,et al.  A Survey of Multilingual Neural Machine Translation , 2019, ACM Comput. Surv..

[126]  James Henderson,et al.  Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.

[127]  Desmond Elliott,et al.  Multilingual Image Description with Neural Sequence Models , 2015, 1510.04709.

[128]  Lemao Liu,et al.  Neural Machine Translation with Source Dependency Representation , 2017, EMNLP.

[129]  Hua Wu,et al.  Multi-channel Encoder for Neural Machine Translation , 2017, AAAI.

[130]  Hans Uszkoreit,et al.  Deeper Machine Translation and Evaluation for German , 2016, DMTW.

[131]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[132]  Yoshua Bengio,et al.  Audio Chord Recognition with Recurrent Neural Networks , 2013, ISMIR.

[133]  Holger Schwenk,et al.  Continuous Space Language Models for Statistical Machine Translation , 2006, ACL.

[134]  Christopher Kermorvant,et al.  The A2iA Arabic Handwritten Text Recognition System at the Open HaRT2013 Evaluation , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[135]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[136]  Bill Byrne,et al.  Domain Adaptive Inference for Neural Machine Translation , 2019, ACL.

[137]  Graham Neubig,et al.  MTNT: A Testbed for Machine Translation of Noisy Text , 2018, EMNLP.

[138]  Sunita Sarawagi,et al.  Length bias in Encoder Decoder Models and a Case for Global Conditioning , 2016, EMNLP.

[139]  Huda Khayrallah,et al.  Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation , 2018, WMT.

[140]  Colin Raffel,et al.  Learning Hard Alignments with Variational Inference , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[141]  Jonathan G. Fiscus,et al.  Overview of the NIST 2016 LoReHLT evaluation , 2017, Machine Translation.

[142]  Hermann Ney,et al.  Statistical multi-source translation , 2001, MTSUMMIT.

[143]  Huda Khayrallah,et al.  On the Impact of Various Types of Noise on Neural Machine Translation , 2018, NMT@ACL.

[144]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[145]  Andy Way,et al.  Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation , 2018, WMT.

[146]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[147]  Bill Byrne,et al.  Neural Grammatical Error Correction with Finite State Transducers , 2019, NAACL.

[148]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[149]  William J. Byrne,et al.  Hierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-n Grammars , 2010, CL.

[150]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[151]  Mariana L. Neves,et al.  The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine , 2016, LREC.

[152]  Tetsuji Nakagawa,et al.  An Empirical Study of Language Relatedness for Transfer Learning in Neural Machine Translation , 2017, PACLIC.

[153]  Andy Way,et al.  Neural Pre-Translation for Hybrid Machine Translation , 2017, MTSUMMIT.

[154]  Yang Liu,et al.  Towards Robust Neural Machine Translation , 2018, ACL.

[155]  Yoshua Bengio,et al.  A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[156]  Alexandre Allauzen,et al.  Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[157]  Yann Dauphin,et al.  A Convolutional Encoder Model for Neural Machine Translation , 2016, ACL.

[158]  Bill Byrne,et al.  Unfolding and Shrinking Neural Machine Translation Ensembles , 2017, EMNLP.

[159]  C. Lee Giles,et al.  The Neural Network Pushdown Automaton: Model, Stack and Learning Simulations , 2017, ArXiv.

[160]  Naoaki Okazaki,et al.  Positional Encoding to Control Output Sequence Length , 2019, NAACL.

[161]  Atsushi Fujita,et al.  Enhancement of Encoder and Attention Using Target Monolingual Corpora in Neural Machine Translation , 2018, NMT@ACL.

[162]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[163]  Yonatan Belinkov,et al.  Identifying and Controlling Important Neurons in Neural Machine Translation , 2018, ICLR.

[164]  Di He,et al.  Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input , 2018, AAAI.

[165]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[166]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[167]  Adrià de Gispert,et al.  The University of Cambridge’s Machine Translation Systems for WMT18 , 2018, WMT.

[168]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[169]  Jungi Kim,et al.  Boosting Neural Machine Translation , 2016, IJCNLP.

[170]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[171]  Daniel Jurafsky,et al.  Mutual Information and Diverse Decoding Improve Neural Machine Translation , 2016, ArXiv.

[172]  Geoffrey E. Hinton,et al.  Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[173]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[174]  Yang Liu,et al.  Neural Machine Translation with Reconstruction , 2016, AAAI.

[175]  Tara N. Sainath,et al.  Learning compact recurrent neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[176]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[177]  Joakim Nivre,et al.  An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation , 2018, WMT.

[178]  Hermann Ney,et al.  On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation , 2018, WMT.

[179]  Brian Roark,et al.  The OpenGrm open-source finite-state grammar software libraries , 2012, ACL.

[180]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[181]  Quoc V. Le,et al.  The Evolved Transformer , 2019, ICML.

[182]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[183]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[184]  Mikko Kurimo,et al.  Minimum Bayes Risk Combination of Translation Hypotheses from Alternative Morphological Decompositions , 2009, NAACL.

[185]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[186]  Marcin Junczys-Dowmunt,et al.  Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction , 2016, EMNLP.

[187]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[188]  Huda Khayrallah,et al.  Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation , 2019, NAACL.

[189]  Steve Renals,et al.  Dynamic Evaluation of Transformer Language Models , 2019, ArXiv.

[190]  Jiajun Zhang,et al.  End-to-End Speech Translation with Knowledge Distillation , 2019, INTERSPEECH.

[191]  Richard Socher,et al.  Towards Neural Machine Translation with Latent Tree Attention , 2017, SPNLP@EMNLP.

[192]  Alexander J. Smola,et al.  Neural Machine Translation with Recurrent Attention Modeling , 2016, EACL.

[193]  Wei Chen,et al.  A Character-Aware Encoder for Neural Machine Translation , 2016, COLING.

[194]  Richard Edwin Stearns,et al.  Syntax-Directed Transduction , 1966, JACM.

[195]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[196]  Marta R. Costa-jussà,et al.  Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.

[197]  Mauro Cettolo,et al.  Overview of the IWSLT 2017 Evaluation Campaign , 2017, IWSLT.

[198]  Jason Lee,et al.  Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.

[199]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[200]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[201]  Jindrich Libovický,et al.  Neural Monkey: An Open-source Tool for Sequence Learning , 2017, Prague Bull. Math. Linguistics.

[202]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[203]  George F. Foster,et al.  Cost Weighting for Neural Machine Translation Domain Adaptation , 2017, NMT@ACL.

[204]  Ahmed Rafea,et al.  Enhancing Translation from English to Arabic Using Two-Phase Decoder Translation , 2018, IntelliSys.

[205]  Alexander M. Fraser,et al.  Modeling Target-Side Inflection in Neural Machine Translation , 2017, WMT.

[206]  Marta R. Costa-jussà,et al.  Byte-based Neural Machine Translation , 2017, SWCN@EMNLP.

[207]  Marcin Junczys-Dowmunt,et al.  Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora , 2018, WMT.

[208]  Hermann Ney,et al.  The RWTH Aachen University Filtering System for the WMT 2018 Parallel Corpus Filtering Task , 2018, WMT.

[209]  Adam Coates,et al.  Cold Fusion: Training Seq2Seq Models Together with Language Models , 2017, INTERSPEECH.

[210]  Ming Zhou,et al.  Unsupervised Neural Machine Translation with SMT as Posterior Regularization , 2019, AAAI.

[211]  David Grangier,et al.  Vocabulary Selection Strategies for Neural Machine Translation , 2016, ArXiv.

[212]  Qiang Zhang,et al.  Variational Self-attention Model for Sentence Representation , 2018, ArXiv.

[213]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[214]  Huanbo Luan,et al.  Improving the Transformer Translation Model with Document-Level Context , 2018, EMNLP.

[215]  Yang Liu,et al.  Visualizing and Understanding Neural Machine Translation , 2017, ACL.

[216]  Yaohua Tang,et al.  Neural Machine Translation with External Phrase Memory , 2016, ArXiv.

[217]  Nadir Durrani,et al.  Edinburgh’s Phrase-based Machine Translation Systems for WMT-14 , 2014, WMT@ACL.

[218]  Rico Sennrich,et al.  Predicting Target Language CCG Supertags Improves Neural Machine Translation , 2017, WMT.

[219]  David Vilar,et al.  Learning Hidden Unit Contribution for Adapting Neural Machine Translation Models , 2018, NAACL.

[220]  Adrià de Gispert,et al.  Source sentence simplification for statistical machine translation , 2017, Comput. Speech Lang..

[221]  Bashô Matsuo Basho: The Complete Haiku , 2008 .

[222]  Antonio Toral,et al.  A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions , 2017, EACL.

[223]  Yoshua Bengio,et al.  Context-dependent word representation for neural machine translation , 2016, Comput. Speech Lang..

[224]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[225]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[226]  Jing Yang,et al.  Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT , 2018, NLPCC.

[227]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[228]  Rico Sennrich,et al.  Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[229]  Khalil Sima'an,et al.  Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.

[230]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[231]  Hermann Ney,et al.  Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information , 2017, WMT.

[232]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[233]  John DeNero,et al.  Adding Interpretable Attention to Neural Translation Models Improves Word Alignment , 2019, ArXiv.

[234]  Taku Kudo,et al.  Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[235]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[236]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[237]  Huda Khayrallah,et al.  Neural Lattice Search for Domain Adaptation in Machine Translation , 2017, IJCNLP.

[238]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[239]  Bill Byrne,et al.  Syntactically Guided Neural Machine Translation , 2016, ACL.

[240]  Stephen Clark,et al.  Syntax-Based Grammaticality Improvement using CCG and Guided Search , 2011, EMNLP.

[241]  Giorgio Satta,et al.  Generalized Multitext Grammars , 2004, ACL.

[242]  Dipankar Das,et al.  SMT vs NMT: A Comparison over Hindi and Bengali Simple Sentences , 2018, ICON.

[243]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[244]  Marcello Federico,et al.  Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition Errors , 2017, INTERSPEECH.

[245]  Yachao Li,et al.  Neural Machine Translation with Phrasal Attention , 2017, CWMT.

[246]  Satoshi Nakamura,et al.  Guiding Neural Machine Translation with Retrieved Translation Pieces , 2018, NAACL.

[247]  Masao Utiyama,et al.  Sentence Embedding for Neural Machine Translation Domain Adaptation , 2017, ACL.

[248]  Graham Neubig,et al.  Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016 , 2016, WAT@COLING.

[249]  William J. Byrne,et al.  Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices , 2010, ACL.

[250]  Khalil Sima'an,et al.  Modeling Latent Sentence Structure in Neural Machine Translation , 2019, ArXiv.

[251]  Lemao Liu,et al.  Deterministic Attention for Sequence-to-Sequence Constituent Parsing , 2017, AAAI.

[252]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[253]  Christopher D. Manning,et al.  Compression of Neural Machine Translation Models via Pruning , 2016, CoNLL.

[254]  Alexander H. Waibel,et al.  Automatic translation from parallel speech: Simultaneous interpretation as MT training data , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[255]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[256]  Satoshi Nakamura,et al.  Multi-Source Neural Machine Translation with Data Augmentation , 2018, IWSLT.

[257]  Ondrej Bojar,et al.  Morphological and Language-Agnostic Word Segmentation for NMT , 2018, TSD.

[258]  Matthias Sperber,et al.  XNMT: The eXtensible Neural Machine Translation Toolkit , 2018, AMTA.

[259]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[260]  Marcus Tomalin,et al.  Word Ordering with Phrase-Based Grammars , 2014, EACL.

[261]  Stephen Clark,et al.  Discriminative Syntax-Based Word Ordering for Text Generation , 2015, CL.

[262]  Hichem Sahbi,et al.  Consensus Network Decoding for Statistical Machine Translation System Combination , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[263]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[264]  Gholamreza Haffari,et al.  Sequence to Sequence Mixture Model for Diverse Machine Translation , 2018, CoNLL.

[265]  Shujian Huang,et al.  Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder , 2017, ACL.

[266]  Marcello Federico,et al.  Deep Neural Machine Translation with Weakly-Recurrent Units , 2018, EAMT.

[267]  Steve Renals,et al.  Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[268]  Jakob Uszkoreit,et al.  Blockwise Parallel Decoding for Deep Autoregressive Models , 2018, NeurIPS.

[269]  Roland Kuhn,et al.  Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.

[270]  Myle Ott,et al.  Scaling Neural Machine Translation , 2018, WMT.

[271]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[272]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[273]  Graham Neubig,et al.  A Tree-based Decoder for Neural Machine Translation , 2018, EMNLP.

[274]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[275]  Xiaocheng Feng,et al.  Adaptive Multi-pass Decoder for Neural Machine Translation , 2018, EMNLP.

[276]  Aurko Roy,et al.  Fast Decoding in Sequence Models using Discrete Latent Variables , 2018, ICML.

[277]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[278]  Raj Dabre,et al.  Neural Machine Translation: Basics, Practical Aspects and Recent Trends , 2017, IJCNLP.

[279]  Raymond Hendy Susanto,et al.  The CoNLL-2014 Shared Task on Grammatical Error Correction , 2014 .

[280]  Koray Kavukcuoglu,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[281]  Alexander M. Rush,et al.  Word Ordering Without Syntax , 2016, EMNLP.

[282]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[283]  Ivan Skorokhodov,et al.  Semi-Supervised Neural Machine Translation with Language Models , 2018, LoResMT@AMTA.

[284]  Jingbo Zhu,et al.  Handling Many-To-One UNK Translation for Neural Machine Translation , 2017, CWMT.

[285]  Paul Buitelaar,et al.  Augmenting Neural Machine Translation with Knowledge Graphs , 2019, ArXiv.

[286]  Jordan L. Boyd-Graber,et al.  Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation , 2014, EMNLP.

[287]  Andy Way,et al.  SMT versus NMT: Preliminary comparisons for Irish , 2018, LoResMT@AMTA.

[288]  Xing Wang,et al.  Modeling Recurrence for Transformer , 2019, NAACL.

[289]  Tanja Schmidt,et al.  How to Move to Neural Machine Translation for Enterprise-Scale Programs - An Early Adoption Case Study , 2018, EAMT.

[290]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[291]  David Chiang,et al.  An Attentional Model for Speech Translation Without Transcription , 2016, NAACL.

[292]  Eunsol Choi,et al.  Coarse-to-Fine Question Answering for Long Documents , 2016, ACL.

[293]  Huda Khayrallah,et al.  Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering , 2018, WMT.

[294]  Alfred V. Aho,et al.  Syntax Directed Translations and the Pushdown Assembler , 1969, J. Comput. Syst. Sci..

[295]  Stephen Clark,et al.  Jointly learning sentence embeddings and syntax with unsupervised Tree-LSTMs , 2017, Natural Language Engineering.

[296]  Victor O. K. Li,et al.  Trainable Greedy Decoding for Neural Machine Translation , 2017, EMNLP.

[297]  Wenhu Chen,et al.  Guided Alignment Training for Topic-Aware Neural Machine Translation , 2016, AMTA.

[298]  Mark Fishel,et al.  Confidence through Attention , 2017, MTSummit.

[299]  Yoav Goldberg,et al.  Morphological Inflection Generation with Hard Monotonic Attention , 2016, ACL.

[300]  Yoshua Bengio,et al.  End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.

[301]  Jan Niehues,et al.  Effective Strategies in Zero-Shot Neural Machine Translation , 2017, IWSLT.

[302]  Yifan Gong,et al.  Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.

[303]  Andrei Popescu-Belis,et al.  Self-Attentive Residual Decoder for Neural Machine Translation , 2017, NAACL.

[304]  Tobias Domhan,et al.  How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures , 2018, ACL.

[305]  William J. Byrne,et al.  N-gram posterior probability confidence measures for statistical machine translation: an empirical study , 2012, Machine Translation.

[306]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[307]  Valentin Malykh,et al.  Self-Attentive Model for Headline Generation , 2019, ECIR.

[308]  Helen Yannakoudakis,et al.  Neural Sequence-Labelling Models for Grammatical Error Correction , 2017, EMNLP.

[309]  Ashish Vaswani,et al.  Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[310]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[311]  Shi Feng,et al.  Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model , 2016, ArXiv.

[312]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[313]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[314]  Masashi Toyoda,et al.  A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size , 2017, WAT@IJCNLP.

[315]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[316]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[317]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[318]  Adrià de Gispert,et al.  Multi-representation ensembles and delayed SGD updates improve syntax-based NMT , 2018, ACL.

[319]  Christof Monz,et al.  Dynamic Data Selection for Neural Machine Translation , 2017, EMNLP.

[320]  Rongrong Ji,et al.  Lattice-Based Recurrent Neural Network Encoders for Neural Machine Translation , 2016, AAAI.

[321]  Kevin Knight,et al.  Multi-Source Neural Translation , 2016, NAACL.

[322]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[323]  Grzegorz Chrupala,et al.  Analyzing and interpreting neural networks for NLP: A report on the first BlackboxNLP workshop , 2019, Natural Language Engineering.

[324]  Jiajun Zhang,et al.  Exploiting Source-side Monolingual Data in Neural Machine Translation , 2016, EMNLP.

[325]  Fethi Bougares,et al.  Neural Machine Translation by Generating Multiple Linguistic Factors , 2017, SLSP.

[326]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[327]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[328]  Yoshimasa Tsuruoka,et al.  Incorporating Source-Side Phrase Structures into Neural Machine Translation , 2019, Computational Linguistics.

[329]  Desmond Elliott,et al.  Findings of the Third Shared Task on Multimodal Machine Translation , 2018, WMT.

[330]  Andy Way,et al.  Exploiting Cross-Sentence Context for Neural Machine Translation , 2017, EMNLP.

[331]  Omer Levy,et al.  Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation , 2019, EMNLP.

[332]  Anoop Sarkar,et al.  Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering , 2013, EMNLP.

[333]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[334]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[335]  Chenhui Chu,et al.  Kyoto University Participation to WAT 2016 , 2016, COLING 2016.

[336]  Alexander J. Smola,et al.  Language Models with Transformers , 2019, ArXiv.

[337]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[338]  Cyril Allauzen,et al.  Pushdown Automata in Statistical Machine Translation , 2014, CL.

[339]  Bill Byrne,et al.  An Operation Sequence Model for Explainable Neural Machine Translation , 2018, BlackboxNLP@EMNLP.

[340]  Amanda Stent,et al.  Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers) , 2018, North American Chapter of the Association for Computational Linguistics.

[341]  David Barber,et al.  Generative Neural Machine Translation , 2018, NeurIPS.

[342]  Xu Sun,et al.  Deconvolution-Based Global Decoding for Neural Machine Translation , 2018, COLING.

[343]  Jingtao Yao,et al.  Chunk-based Decoder for Neural Machine Translation , 2017, ACL.

[344]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[345]  I. Dan Melamed,et al.  Multitext Grammars and Synchronous Parsers , 2003, NAACL.

[346]  Paris Smaragdis,et al.  NoiseOut: A Simple Way to Prune Neural Networks , 2016, ArXiv.

[347]  Kenny Q. Zhu,et al.  Controlling Length in Abstractive Summarization Using a Convolutional Neural Network , 2018, EMNLP.

[348]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[349]  Colin Raffel,et al.  Online and Linear-Time Attention by Enforcing Monotonic Alignments , 2017, ICML.

[350]  Shankar Kumar,et al.  Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2008, EMNLP.

[351]  Sunita Sarawagi,et al.  Calibration of Encoder Decoder Models for Neural Machine Translation , 2019, ArXiv.

[352]  Minh Le Nguyen,et al.  Regularizing Forward and Backward Decoding to Improve Neural Machine Translation , 2018, 2018 10th International Conference on Knowledge and Systems Engineering (KSE).

[353]  Ian McGraw,et al.  On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[354]  Matt Post,et al.  We start by defining the recurrent architecture as implemented in S OCKEYE , following , 2018 .

[355]  Ankur Bapna,et al.  Revisiting Character-Based Neural Machine Translation with Capacity and Compression , 2018, EMNLP.

[356]  Alex Wang,et al.  Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling , 2018, ArXiv.

[357]  Satoshi Nakamura,et al.  Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015 , 2015, WAT.

[358]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[359]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[360]  Ole Winther,et al.  Recurrent Relational Networks , 2017, NeurIPS.

[361]  Andy Way,et al.  Investigating Backtranslation in Neural Machine Translation , 2018, EAMT.

[362]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[363]  Ted Briscoe,et al.  Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction , 2017, ACL.

[364]  Marine Carpuat,et al.  Bi-Directional Neural Machine Translation with Synthetic Parallel Data , 2018, NMT@ACL.

[365]  Matt Post,et al.  Grammatical Error Correction with Neural Reinforcement Learning , 2017, IJCNLP.

[366]  Jean Oh,et al.  Attention-based Multimodal Neural Machine Translation , 2016, WMT.

[367]  Ho-Gyeong Kim,et al.  Knowledge Distillation Using Output Errors for Self-attention End-to-end Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[368]  Hai Zhao,et al.  Finding Better Subword Segmentation for Neural Machine Translation , 2018, CCL.

[369]  Marcello Federico,et al.  Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English , 2017, Prague Bull. Math. Linguistics.

[370]  Marcin Andrychowicz,et al.  Neural Random Access Machines , 2015, ERCIM News.

[371]  Deyi Xiong,et al.  Two Effective Approaches to Data Reduction for Neural Machine Translation: Static and Dynamic Sentence Selection , 2018, 2018 International Conference on Asian Language Processing (IALP).

[372]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[373]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[374]  Lukasz Kaiser,et al.  Depthwise Separable Convolutions for Neural Machine Translation , 2017, ICLR.

[375]  Miguel Ballesteros,et al.  Pieces of Eight: 8-bit Neural Machine Translation , 2018, NAACL.

[376]  Mehryar Mohri Edit-Distance Of Weighted Automata: General Definitions And Algorithms , 2003, Int. J. Found. Comput. Sci..

[377]  Gonzalo Iglesias,et al.  Speed-Constrained Tuning for Statistical Machine Translation Using Bayesian Optimization , 2016, HLT-NAACL.

[378]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[379]  Marcus Tomalin,et al.  A Comparison of Neural Models for Word Ordering , 2017, INLG.

[380]  Changhan Wang,et al.  Levenshtein Transformer , 2019, NeurIPS.

[381]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[382]  Gholamreza Haffari,et al.  Iterative Back-Translation for Neural Machine Translation , 2018, NMT@ACL.

[383]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[384]  Wenhu Chen,et al.  Triangular Architecture for Rare Language Translation , 2018, ACL.

[385]  Tomoki Toda,et al.  Speed or accuracy? a study in evaluation of simultaneous speech translation , 2015, INTERSPEECH.

[386]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[387]  Pierre Isabelle,et al.  A Challenge Set Approach to Evaluating Machine Translation , 2017, EMNLP.

[388]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[389]  Artem Sokolov,et al.  Optimally Segmenting Inputs for NMT Shows Preference for Character-Level Processing , 2018 .

[390]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[391]  Jiajun Zhang,et al.  A Comparable Study on Model Averaging, Ensembling and Reranking in NMT , 2018, NLPCC.

[392]  Xin Wang,et al.  Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation , 2019, NAACL.

[393]  Nadir Durrani,et al.  The Operation Sequence Model—Combining N-Gram-Based and Phrase-Based Statistical Machine Translation , 2015, CL.

[394]  Fethi Bougares,et al.  Factored Neural Machine Translation Architectures , 2016, IWSLT.

[395]  Joel R. Tetreault,et al.  JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction , 2017, EACL.

[396]  Marta R. Costa-jussà,et al.  Coverage for Character Based Neural Machine Translation , 2017, Proces. del Leng. Natural.

[397]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[398]  Graham Neubig,et al.  Rapid Adaptation of Neural Machine Translation to New Languages , 2018, EMNLP.

[399]  Yoshua Bengio,et al.  Blocks and Fuel: Frameworks for deep learning , 2015, ArXiv.

[400]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[401]  Jiajun Zhang,et al.  Towards Zero Unknown Word in Neural Machine Translation , 2016, IJCAI.

[402]  Jugal K. Kalita,et al.  Parallel Attention Mechanisms in Neural Machine Translation , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[403]  Vaibhava Goel,et al.  Segmental minimum Bayes-risk ASR voting strategies , 2000, INTERSPEECH.

[404]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[405]  Dario Amodei,et al.  An Empirical Model of Large-Batch Training , 2018, ArXiv.

[406]  Srinivas Bangalore,et al.  A Finite-State Approach to Machine Translation , 2001, NAACL.

[407]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[408]  Yoshua Bengio,et al.  Multi-way, multilingual neural machine translation , 2017, Comput. Speech Lang..

[409]  André F. T. Martins,et al.  Sparse and Constrained Attention for Neural Machine Translation , 2018, ACL.

[410]  Christof Monz,et al.  What does Attention in Neural Machine Translation Pay Attention to? , 2017, IJCNLP.

[411]  Gholamreza Haffari,et al.  Towards Decoding as Continuous Optimisation in Neural Machine Translation , 2017, EMNLP.

[412]  Chris Dyer,et al.  Sentence Encoding with Tree-constrained Relation Networks , 2018, ArXiv.

[413]  Qun Liu,et al.  An error analysis for image-based multi-modal neural machine translation , 2019, Machine Translation.

[414]  Philipp Koehn,et al.  Neural Machine Translation , 2017, ArXiv.

[415]  Jianfeng Gao,et al.  A Nested Attention Neural Hybrid Model for Grammatical Error Correction , 2017, ACL.

[416]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[417]  Aman Hussain,et al.  Text Normalization using Memory Augmented Neural Networks , 2019, Speech Commun..

[418]  Jakob Uszkoreit,et al.  Insertion Transformer: Flexible Sequence Generation via Insertion Operations , 2019, ICML.

[419]  Stefan Riezler,et al.  Multimodal Pivots for Image Caption Translation , 2016, ACL.

[420]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[421]  Lemao Liu,et al.  Agreement on Target-bidirectional Neural Machine Translation , 2016, NAACL.

[422]  Rico Sennrich,et al.  Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.

[423]  Jingbo Zhu,et al.  A Simple and Effective Approach to Coverage-Aware Neural Machine Translation , 2018, ACL.

[424]  Laura Tomasello Neural Machine Translation and Artificial Intelligence: What Is Left for the Human Translator? , 2019 .

[425]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[426]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[427]  Yang Liu,et al.  Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention , 2016, ArXiv.

[428]  Enhong Chen,et al.  Regularizing Neural Machine Translation by Target-bidirectional Agreement , 2018, AAAI.

[429]  Sebastian Möller,et al.  Train, Sort, Explain: Learning to Diagnose Translation Models , 2019, NAACL.

[430]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[431]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[432]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[433]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[434]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[435]  Josep Maria Crego,et al.  Domain Control for Neural Machine Translation , 2016, RANLP.

[436]  Zhiguo Wang,et al.  Vocabulary Manipulation for Neural Machine Translation , 2016, ACL.

[437]  Hua Wu,et al.  Modeling Coherence for Discourse Neural Machine Translation , 2018, AAAI.

[438]  Lemao Liu,et al.  Instance Weighting for Neural Machine Translation Domain Adaptation , 2017, EMNLP.

[439]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[440]  Alexander H. Waibel,et al.  Simultaneous translation of lectures and speeches , 2007, Machine Translation.

[441]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[442]  Rico Sennrich,et al.  The AMU-UEDIN Submission to the WMT16 News Translation Task: Attention-based NMT Models as Feature Functions in Phrase-based SMT , 2016, WMT.

[443]  Yong Zhang,et al.  Attention pooling-based convolutional neural network for sentence modelling , 2016, Inf. Sci..

[444]  Yoshua Bengio,et al.  Professor Forcing: A New Algorithm for Training Recurrent Networks , 2016, NIPS.

[445]  Marcis Pinnis,et al.  Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data , 2017, TSD.

[446]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[447]  Orhan Firat,et al.  Does Neural Machine Translation Benefit from Larger Context? , 2017, ArXiv.

[448]  Orhan Firat,et al.  Massively Multilingual Neural Machine Translation , 2019, NAACL.

[449]  Ming Zhou,et al.  Fluency Boost Learning and Inference for Neural Grammatical Error Correction , 2018, ACL.

[450]  Mauro Cettolo,et al.  A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation , 2018, COLING.

[451]  Lijun Wu,et al.  A Study of Reinforcement Learning for Neural Machine Translation , 2018, EMNLP.

[452]  Gregory Kell Overcoming Catastrophic Forgetting in Neural Machine Translation , 2018 .

[453]  Ted Briscoe,et al.  Language Model Based Grammatical Error Correction without Annotated Training Data , 2018, BEA@NAACL-HLT.

[454]  Tie-Yan Liu,et al.  Adversarial Neural Machine Translation , 2017, ACML.

[455]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[456]  Hermann Ney,et al.  The RWTH Aachen Machine Translation System for WMT 2010 , 2010, IWSLT.

[457]  Hermann Ney,et al.  Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation , 2018, EMNLP.

[458]  Marta R. Costa-jussà,et al.  (Self-Attentive) Autoencoder-based Universal Language Representation for Machine Translation , 2018, ArXiv.

[459]  Kaitao Song,et al.  Hybrid Self-Attention Network for Machine Translation , 2018, ArXiv.

[460]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[461]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[462]  Naren Ramakrishnan,et al.  Deep Reinforcement Learning for Sequence-to-Sequence Models , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[463]  Kyunghyun Cho,et al.  Can neural machine translation do simultaneous translation? , 2016, ArXiv.

[464]  Yang Liu,et al.  THUMT: An Open-Source Toolkit for Neural Machine Translation , 2017, AMTA.

[465]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[466]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[467]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[468]  Yang Feng,et al.  Joint Decoding with Multiple Translation Models , 2009, ACL/IJCNLP.

[469]  Hwee Tou Ng,et al.  Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[470]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[471]  Brian Roark,et al.  Neural Models of Text Normalization for Speech Applications , 2019, Computational Linguistics.

[472]  Lemao Liu,et al.  Neural Machine Translation with Supervised Attention , 2016, COLING.

[473]  Pavel Levin,et al.  Toward a full-scale neural machine translation in production: the Booking.com use case , 2017, MTSUMMIT.

[474]  Yoshua Bengio,et al.  Fine-grained attention mechanism for neural machine translation , 2018, Neurocomputing.

[475]  Chao-Hong Liu,et al.  The RGNLP Machine Translation Systems for WAT 2018 , 2018, PACLIC.

[476]  Min Zhang,et al.  Neural Machine Translation Advised by Statistical Machine Translation , 2016, AAAI.

[477]  Nikolaos Pappas,et al.  Global-Context Neural Machine Translation through Target-Side Attentive Residual Connections , 2017, ArXiv.

[478]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[479]  Alexander H. Waibel,et al.  Training speech translation from audio recordings of interpreter-mediated communication , 2013, Comput. Speech Lang..

[480]  M. Powell The BOBYQA algorithm for bound constrained optimization without derivatives , 2009 .

[481]  Satoshi Nakamura,et al.  An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation , 2017, NMT@ACL.

[482]  Bill Byrne,et al.  The CUED's Grammatical Error Correction Systems for BEA-2019 , 2019, BEA@ACL.

[483]  D. Sculley,et al.  Winner's Curse? On Pace, Progress, and Empirical Rigor , 2018, ICLR.

[484]  Atsushi Fujita,et al.  A Smorgasbord of Features to Combine Phrase-Based and Neural Machine Translation , 2018, AMTA.

[485]  Philipp Koehn,et al.  A Systematic Analysis of Translation Model Search Spaces , 2009, WMT@EACL.

[486]  Yang Feng,et al.  Memory-augmented Neural Machine Translation , 2017, EMNLP.

[487]  Chenhui Chu,et al.  A Comprehensive Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2018, J. Inf. Process..

[488]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[489]  Yufeng Chen,et al.  A Method of Unknown Words Processing for Neural Machine Translation Using HowNet , 2017, CWMT.

[490]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[491]  Qi Liu,et al.  Insertion-based Decoding with Automatically Inferred Generation Order , 2019, Transactions of the Association for Computational Linguistics.

[492]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[493]  Philipp Koehn,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[494]  Hongfei Xu,et al.  Neutron: An Implementation of the Transformer Translation Model and its Variants , 2019, ArXiv.

[495]  Nenghai Yu,et al.  Dual Supervised Learning , 2017, ICML.

[496]  Maosong Sun,et al.  Semi-Supervised Learning for Neural Machine Translation , 2016, ACL.

[497]  David Chiang,et al.  Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation , 2017, IJCNLP.

[498]  Justin Luitjens,et al.  A GPU-based WFST Decoder with Exact Lattice Generation , 2018, INTERSPEECH.

[499]  Ondrej Bojar,et al.  Paying Attention to Multi-Word Expressions in Neural Machine Translation , 2017, MTSUMMIT.

[500]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[501]  Kenneth Heafield,et al.  Copied Monolingual Data Improves Low-Resource Neural Machine Translation , 2017, WMT.

[502]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[503]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[504]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[505]  Pushpak Bhattacharyya,et al.  Faster Decoding for Subword Level Phrase-based SMT between Related Languages , 2016, VarDial@COLING.

[506]  George F. Foster,et al.  Reinforcement Learning based Curriculum Optimization for Neural Machine Translation , 2019, NAACL.

[507]  Quan Z. Sheng,et al.  Generating Textual Adversarial Examples for Deep Learning Models: A Survey , 2019, ArXiv.

[508]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[509]  Francisco Casacuberta,et al.  NMT-Keras: a Very Flexible Toolkit with a Focus on Interactive NMT and Online Learning , 2018, Prague Bull. Math. Linguistics.

[510]  Chong Wang,et al.  Neural Phrase-to-Phrase Machine Translation , 2018, ArXiv.

[511]  Philipp Koehn,et al.  Syntax-based Statistical Machine Translation , 2016, Synthesis Lectures on Human Language Technologies.

[512]  François Yvon,et al.  Using Monolingual Data in Neural Machine Translation: a Systematic Study , 2018, WMT.

[513]  José A. R. Fonollosa,et al.  Character-based Neural Machine Translation , 2016, ACL.

[514]  Bo Wang,et al.  SYSTRAN's Pure Neural Machine Translation Systems , 2016, ArXiv.

[515]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[516]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[517]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[518]  Barnabás Póczos,et al.  Competence-based Curriculum Learning for Neural Machine Translation , 2019, NAACL.

[519]  Yoav Goldberg,et al.  Assessing BERT's Syntactic Abilities , 2019, ArXiv.

[520]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[521]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[522]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[523]  Shan Wu,et al.  Variational Recurrent Neural Machine Translation , 2018, AAAI.

[524]  Jörg Tiedemann,et al.  Neural Machine Translation with Extended Context , 2017, DiscoMT@EMNLP.

[525]  Yoshua Bengio,et al.  On integrating a language model into neural machine translation , 2017, Comput. Speech Lang..

[526]  William Lewis,et al.  Skype Translator: Breaking down language and hearing barriers. A behind the scenes look at near real-time speech translation , 2015, TC.

[527]  Helen Yannakoudakis,et al.  Neural and FST-based approaches to grammatical error correction , 2019, BEA@ACL.

[528]  Quoc V. Le,et al.  Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.

[529]  Mingbo Ma,et al.  Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation , 2018, EMNLP.

[530]  Jimmy J. Lin,et al.  Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.

[531]  Richard Socher,et al.  Weighted Transformer Network for Machine Translation , 2017, ArXiv.

[532]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[533]  Wei Chen,et al.  Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets , 2017, NAACL.

[534]  Bill Byrne,et al.  On NMT Search Errors and Model Errors: Cat Got Your Tongue? , 2019, EMNLP.

[535]  Guodong Zhou,et al.  Cache-based Document-level Neural Machine Translation , 2017, ArXiv.

[536]  Jiajun Zhang,et al.  Neural System Combination for Machine Translation , 2017, ACL.

[537]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[538]  Josef van Genabith,et al.  USAAR: An Operation Sequential Model for Automatic Statistical Post-Editing , 2016, WMT.

[539]  Yang Liu,et al.  Learning to Remember Translation History with a Continuous Cache , 2017, TACL.

[540]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[541]  Ming Zhou,et al.  Sequence-to-Dependency Neural Machine Translation , 2017, ACL.

[542]  Markus Freitag,et al.  Ensemble Distillation for Neural Machine Translation , 2017, ArXiv.

[543]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[544]  Graham Neubig,et al.  Neural Machine Translation and Sequence-to-sequence Models: A Tutorial , 2017, ArXiv.

[545]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[546]  Matthias Sperber,et al.  Neural Lattice-to-Sequence Models for Uncertain Inputs , 2017, EMNLP.

[547]  Jan Niehues,et al.  Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[548]  Wei Chen,et al.  Sogou Neural Machine Translation Systems for WMT17 , 2017, WMT.

[549]  Richard E. Fairley,et al.  Guide to the Software Engineering Body of Knowledge (SWEBOK(R)): Version 3.0 , 2014 .

[550]  Dakwale,et al.  Fine-Tuning for Neural Machine Translation with Limited Degradation across In- and Out-of-Domain Data , 2017, MTSUMMIT.

[551]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[552]  Chris Dyer,et al.  Notes on Noise Contrastive Estimation and Negative Sampling , 2014, ArXiv.

[553]  Graham Neubig,et al.  Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers , 2013, ACL.

[554]  Mehryar Mohri,et al.  Speech Recognition with Weighted Finite-State Transducers , 2008 .

[555]  Yichao Lu,et al.  A neural interlingua for multilingual machine translation , 2018, WMT.

[556]  Mikio Yamamoto,et al.  Translation of Patent Sentences with a Large Vocabulary of Technical Terms Using Neural Machine Translation , 2016, WAT@COLING.

[557]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[558]  G. Seth Psychology of Language , 1968, Nature.

[559]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[560]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[561]  Iryna Gurevych,et al.  Still not there? Comparing Traditional Sequence-to-Sequence Models to Encoder-Decoder Neural Networks on Monotone String Translation Tasks , 2016, COLING.

[562]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[563]  Satoshi Nakamura,et al.  Improving Neural Machine Translation through Phrase-based Forced Decoding , 2017, IJCNLP.

[564]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[565]  Lei Yu,et al.  The Neural Noisy Channel , 2016, ICLR.

[566]  Takenobu Tokunaga,et al.  Key-value Attention Mechanism for Neural Machine Translation , 2017, IJCNLP.

[567]  Rico Sennrich,et al.  How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs , 2016, EACL.

[568]  Alex Waibel,et al.  Adaptation of the translation model for statistical machine translation based on information retrieval , 2005, EAMT.

[569]  Hua Wu,et al.  Improved Neural Machine Translation with SMT Features , 2016, AAAI.

[570]  Navdeep Jaitly,et al.  Towards Better Decoding and Language Model Integration in Sequence to Sequence Models , 2016, INTERSPEECH.

[571]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[572]  Graham Neubig,et al.  Controlling Output Length in Neural Encoder-Decoders , 2016, EMNLP.

[573]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[574]  Roi Livni,et al.  On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[575]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[576]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[577]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[578]  Mehryar Mohri,et al.  On the Disambiguation of Weighted Automata , 2014, CIAA.

[579]  Andy Way,et al.  A Comparative Quality Evaluation of PBSMT and NMT using Professional Translators , 2017, MTSUMMIT.

[580]  Quoc V. Le,et al.  Effective Domain Mixing for Neural Machine Translation , 2017, WMT.

[581]  William A. Woods,et al.  Computational Linguistics Transition Network Grammars for Natural Language Analysis , 2022 .

[582]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[583]  Zhiguo Wang,et al.  Supervised Attentions for Neural Machine Translation , 2016, EMNLP.

[584]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[585]  Michael Carl,et al.  Post-editing neural machine translation versus phrase-based machine translation for English–Chinese , 2019, Machine Translation.

[586]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[587]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[588]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[589]  Kyunghyun Cho,et al.  Non-Monotonic Sequential Text Generation , 2019, ICML.

[590]  Dylan Cashman,et al.  RNNbow: Visualizing Learning Via Backpropagation Gradients in RNNs , 2018, IEEE Computer Graphics and Applications.

[591]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[592]  Marcin Junczys-Dowmunt,et al.  Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation , 2018, NAACL.

[593]  Zheng Zhang,et al.  Star-Transformer , 2019, NAACL.

[594]  Manaal Faruqui,et al.  Cross-lingual Models of Word Embeddings: An Empirical Comparison , 2016, ACL.

[595]  Yang Liu,et al.  Agreement-Based Joint Training for Bidirectional Attention-Based Neural Machine Translation , 2015, IJCAI.

[596]  Arianna Bisazza,et al.  Neural versus phrase-based MT quality: An in-depth analysis on English-German and English-French , 2018, Comput. Speech Lang..

[597]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[598]  Elizabeth Salesky,et al.  Optimizing segmentation granularity for neural machine translation , 2018, Machine Translation.

[599]  J. Kiefer,et al.  Sequential minimax search for a maximum , 1953 .

[600]  Di He,et al.  Decoding with Value Networks for Neural Machine Translation , 2017, NIPS.

[601]  Rui Yan,et al.  Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.

[602]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[603]  Min Zhang,et al.  Incorporating Statistical Machine Translation Word Knowledge Into Neural Machine Translation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[604]  Ryan Cotterell,et al.  Neural Multi-Source Morphological Reinflection , 2016, EACL.

[605]  Tiejun Zhao,et al.  Syntax-Directed Attention for Neural Machine Translation , 2017, AAAI.

[606]  Ji Zhang,et al.  Semi-Autoregressive Neural Machine Translation , 2018, EMNLP.

[607]  Yufeng Chen,et al.  An Unknown Word Processing Method in NMT by Integrating Syntactic Structure and Semantic Concept , 2017, CWMT.

[608]  Chandra Bhagavatula,et al.  Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[609]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[610]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[611]  Gabriel Synnaeve,et al.  A Fully Differentiable Beam Search Decoder , 2019, ICML.

[612]  Pascal Vincent,et al.  Hierarchical Memory Networks , 2016, ArXiv.

[613]  Dan Liu,et al.  Learning Efficient Lexically-Constrained Neural Machine Translation with External Memory , 2019, ArXiv.

[614]  Rui Wang,et al.  A Survey of Domain Adaptation for Neural Machine Translation , 2018, COLING.

[615]  Graham Neubig,et al.  SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation , 2018, EMNLP.

[616]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[617]  Di He,et al.  Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation , 2018, NeurIPS.

[618]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[619]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[620]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[621]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[622]  Hwee Tou Ng,et al.  Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[623]  Vlad Zhukov,et al.  Differentiable lower bound for expected BLEU score , 2017, ArXiv.

[624]  Jiajun Zhang,et al.  Bridging Neural Machine Translation and Bilingual Dictionaries , 2016, ArXiv.

[625]  David Chiang,et al.  Improving Lexical Choice in Neural Machine Translation , 2017, NAACL.

[626]  William J. Byrne,et al.  Hierarchical Phrase-Based Translation with Weighted Finite State Transducers , 2009, HLT-NAACL.

[627]  Xiaolin Wang,et al.  CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++ , 2018, EMNLP.

[628]  Li Gong,et al.  Tencent Neural Machine Translation Systems for WMT18 , 2018, WMT.

[629]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[630]  Ming Zhou,et al.  Machine Translation Detection from Monolingual Web-Text , 2013, ACL.

[631]  Nadir Durrani,et al.  QCRI Machine Translation Systems for IWSLT 16 , 2017, ArXiv.

[632]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[633]  Gholamreza Haffari,et al.  Selective Attention for Context-aware Neural Machine Translation , 2019, NAACL.

[634]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[635]  Li Zhao,et al.  Dual Transfer Learning for Neural Machine Translation with Marginal Distribution Regularization , 2018, AAAI.

[636]  Ming Zhou,et al.  Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study , 2018, ArXiv.

[637]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[638]  Qun Liu,et al.  Memory-enhanced Decoder for Neural Machine Translation , 2016, EMNLP.

[639]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[640]  Mark J. F. Gales,et al.  Sequence Student-Teacher Training of Deep Neural Networks , 2016, INTERSPEECH.

[641]  Yoshimasa Tsuruoka,et al.  Learning to Parse and Translate Improves Neural Machine Translation , 2017, ACL.

[642]  Yuji Matsumoto,et al.  The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings , 2012, COLING.

[643]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[644]  Quoc V. Le,et al.  Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.

[645]  Daniel Jurafsky,et al.  Neural Language Correction with Character-Based Attention , 2016, ArXiv.

[646]  Mirella Lapata,et al.  Text Generation from Knowledge Graphs with Graph Transformers , 2019, NAACL.

[647]  William J. Byrne,et al.  Hierarchical Phrase-based Translation Representations , 2011, EMNLP.

[648]  Jan Niehues,et al.  Analyzing Neural MT Search and Model Performance , 2017, NMT@ACL.

[649]  Yonatan Belinkov,et al.  Neural Machine Translation Training in a Multi-Domain Scenario , 2017, IWSLT.

[650]  Graham Neubig,et al.  Forest-to-String SMT for Asian Language Translation: NAIST at WAT 2014 , 2014, WAT.

[651]  Huda Khayrallah,et al.  Regularized Training Objective for Continued Training for Domain Adaptation in Neural Machine Translation , 2018, NMT@ACL.

[652]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.

[653]  Chris Quirk,et al.  MT Detection in Web-Scraped Parallel Corpora , 2011, MTSUMMIT.

[654]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[655]  Mohit Iyyer,et al.  Syntactically Supervised Transformers for Faster Neural Machine Translation , 2019, ACL.

[656]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[657]  Holger Schwenk,et al.  Continuous Space Translation Models for Phrase-Based Statistical Machine Translation , 2012, COLING.

[658]  Lucia Specia,et al.  Exploiting Objective Annotations for Minimising Translation Post-editing Effort , 2011, EAMT.

[659]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[660]  Ole Winther,et al.  Convolutional LSTM Networks for Subcellular Localization of Proteins , 2015, AlCoB.

[661]  Markus Freitag,et al.  Fast Domain Adaptation for Neural Machine Translation , 2016, ArXiv.

[662]  Matiss Rikters,et al.  Debugging Neural Machine Translations , 2018, Doctoral Consortium/Forum@DB&IS.

[663]  Graham Neubig,et al.  On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models , 2019, NAACL.

[664]  Lucia Specia,et al.  Findings of the WMT 2018 Shared Task on Quality Estimation , 2018, WMT.

[665]  Lemao Liu,et al.  Sentence Selection and Weighting for Neural Machine Translation Domain Adaptation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[666]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[667]  Navdeep Jaitly,et al.  RNN Approaches to Text Normalization: A Challenge , 2016, ArXiv.

[668]  Ted Briscoe,et al.  Grammatical error correction using neural machine translation , 2016, NAACL.

[669]  Marcin Junczys-Dowmunt,et al.  Microsoft’s Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data , 2018, WMT.

[670]  Hermann Ney,et al.  Alignment-Based Neural Machine Translation , 2016, WMT.

[671]  Alexander M. Rush,et al.  Sequence-Level Knowledge Distillation , 2016, EMNLP.

[672]  Markus Freitag,et al.  Beam Search Strategies for Neural Machine Translation , 2017, NMT@ACL.

[673]  Enhong Chen,et al.  Bidirectional Generative Adversarial Networks for Neural Machine Translation , 2018, CoNLL.

[674]  Philip Resnik,et al.  Mining the Web for Bilingual Text , 1999, ACL.

[675]  Andy Way,et al.  Is Neural Machine Translation the New State of the Art? , 2017, Prague Bull. Math. Linguistics.

[676]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[677]  Kyunghyun Cho,et al.  Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model , 2016, ArXiv.

[678]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[679]  Preslav Nakov,et al.  What Is in a Translation Unit? Comparing Character and Subword Representations Beyond Translation , 2018 .

[680]  Matthias Sperber,et al.  Lecture Translator - Speech translation framework for simultaneous lecture translation , 2016, NAACL.

[681]  Rico Sennrich,et al.  The University of Edinburgh’s Neural MT Systems for WMT17 , 2017, WMT.

[682]  Gregory Shakhnarovich,et al.  A Systematic Exploration of Diversity in Machine Translation , 2013, EMNLP.

[683]  Gholamreza Haffari,et al.  Document Context Neural Machine Translation with Memory Networks , 2017, ACL.

[684]  Wei Zhao,et al.  Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data , 2019, NAACL.

[685]  Boris Ginsburg,et al.  OpenSeq2Seq: Extensible Toolkit for Distributed and Mixed Precision Training of Sequence-to-Sequence Models , 2018, ArXiv.

[686]  Rico Sennrich,et al.  Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation , 2018, EMNLP.

[687]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[688]  Kevin Knight A Statistical MT Tutorial Workbook , 2003 .

[689]  Shuming Shi,et al.  Neural Machine Translation with Adequacy-Oriented Learning , 2018, AAAI.

[690]  Kenneth Heafield,et al.  Multi-Source Syntactic Neural Machine Translation , 2018, EMNLP.

[691]  Bill Byrne,et al.  The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16 , 2016, WMT.

[692]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[693]  Jason Lee,et al.  Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.

[694]  Andrei Popescu-Belis,et al.  Context in Neural Machine Translation: A Review of Models and Evaluations , 2019, ArXiv.

[695]  Mehryar Mohri,et al.  A weight pushing algorithm for large vocabulary speech recognition , 2001, INTERSPEECH.

[696]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[697]  Douwe Kiela,et al.  No Training Required: Exploring Random Encoders for Sentence Classification , 2019, ICLR.

[698]  Yidong Chen,et al.  Lattice-to-sequence attentional Neural Machine Translation models , 2018, Neurocomputing.

[699]  Alexander M. Fraser,et al.  Target-side Word Segmentation Strategies for Neural Machine Translation , 2017, WMT.

[700]  Hai Zhao,et al.  Exploring Recombination for Efficient Decoding of Neural Machine Translation , 2018, EMNLP.

[701]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[702]  Simon Osindero,et al.  Recursive Recurrent Nets with Attention Modeling for OCR in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[703]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[704]  Christof Monz,et al.  The Importance of Being Recurrent for Modeling Hierarchical Structure , 2018, EMNLP.

[705]  Lei Yu,et al.  Online Segment to Segment Neural Transduction , 2016, EMNLP.

[706]  Andreas Maletti,et al.  Recurrent Neural Networks as Weighted Language Recognizers , 2017, NAACL.

[707]  Gholamreza Haffari,et al.  Incorporating Syntactic Uncertainty in Neural Machine Translation with a Forest-to-Sequence Model , 2017, COLING.

[708]  Graham Neubig,et al.  Improving Robustness of Machine Translation with Synthetic Noise , 2019, NAACL.

[709]  R. S. Milton,et al.  Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages , 2016, ArXiv.

[710]  Shamil Chollampatt,et al.  Connecting the Dots: Towards Human-Level Grammatical Error Correction , 2017, BEA@EMNLP.

[711]  Christopher Joseph Pal,et al.  Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[712]  Yoshua Bengio,et al.  Montreal Neural Machine Translation Systems for WMT’15 , 2015, WMT@EMNLP.

[713]  Yoshua Bengio,et al.  Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation , 2014, SSST@EMNLP.

[714]  Massimo Piccardi,et al.  English-Basque Statistical and Neural Machine Translation , 2018, LREC.

[715]  Yaser Al-Onaizan,et al.  Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[716]  Felix Hieber,et al.  Using Target-side Monolingual Data for Neural Machine Translation through Multi-task Learning , 2017, EMNLP.

[717]  Lior Wolf,et al.  Non-Adversarial Unsupervised Word Translation , 2018, EMNLP.

[718]  Victor O. K. Li,et al.  Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[719]  Shamil Chollampatt,et al.  Neural Network Translation Models for Grammatical Error Correction , 2016, IJCAI.

[720]  Hans Uszkoreit,et al.  Fine-grained evaluation of German-English Machine Translation based on a Test Suite , 2018, WMT.

[721]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[722]  Sungzoon Cho,et al.  Distance-based Self-Attention Network for Natural Language Inference , 2017, ArXiv.

[723]  Sergei Nirenburg,et al.  Integrating Translations from Multiple Sources within the PANGLOSS Mark III Machine Translation System , 1994, AMTA.

[724]  Alexander M. Rush,et al.  Unsupervised Recurrent Neural Network Grammars , 2019, NAACL.

[725]  奥里奥尔·温亚尔斯,et al.  Neural machine translation systems with rare word processing , 2015 .

[726]  Enhong Chen,et al.  Joint Training for Neural Machine Translation Models with Monolingual Data , 2018, AAAI.

[727]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[728]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[729]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[730]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[731]  Gonzalo Iglesias,et al.  Why not be Versatile? Applications of the SGNMT Decoder for Machine Translation , 2018, AMTA.

[732]  mmarai UNIVERSITE DU MAINE , 2015 .

[733]  Bill Byrne,et al.  SGNMT – A Flexible NMT Decoding Platform for Quick Prototyping of New Models and Search Strategies , 2017, EMNLP.