On the use of prior and external knowledge in neural sequence models

Neural sequencemodels have recently achieved great success across various natural language processing tasks. In practice, neural sequence models require massive amount of annotated training data to reach their desirable performance; however, there will not always be available data across languages, domains or tasks at hand. Prior and external knowledge provides additional contextual information, potentially improving the modelling performance as well as compensating the lack of large training data, particular in low-resourced situations. In this thesis, we investigate the usefulness of utilising prior and external knowledge for improving neural sequence models. We propose the use of various kinds of prior and external knowledge and present different approaches for integrating them into both training and inference phases of neural sequence models. The followings are main contributions of this thesis which are summarised in two major parts: We present the first part of this thesis which is on Training andModelling for neural sequence models. In this part, we investigate different situations (particularly in low resource settings) in which prior and external knowledge, such as side information, linguistic factors, monolingual data, is shown to have great benefits for improving performance of neural sequence models. In addition, we introduce a new means for incorporating prior and external knowledge based on the moment matching framework. This framework serves its purpose for exploiting prior and external knowledge as global features of generated sequences in neural sequence models in order to improve the overall quality of the desired output sequence. The second part is about Decoding of neural sequence models in which we propose a novel decoding framework with relaxed continuous optimisation in order to address one of the drawbacks of existing approximate decoding methods, namely the limited ability to incorporate global factors due to intractable search. We hope that this PhD thesis, constituted by two above major parts, will shed light on the use of prior and external knowledge in neural sequence models, both in their training and decoding phases.

[1]  Quoc V. Le,et al.  Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.

[2]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[3]  David Chiang,et al.  Improving Lexical Choice in Neural Machine Translation , 2017, NAACL.

[4]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[5]  Anders Søgaard,et al.  Factored Translation with Unsupervised Word Clusters , 2011, WMT@EMNLP.

[6]  Daniel Marcu,et al.  Fast Decoding and Optimal Decoding for Machine Translation , 2001, ACL.

[7]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[8]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[9]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.

[10]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[11]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[12]  Scott Lundberg,et al.  Checkpoint Ensembles: Ensemble Methods from a Single Training Process , 2017, ArXiv.

[13]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[14]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[15]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[16]  Jan Niehues,et al.  The IWSLT 2015 Evaluation Campaign , 2015, IWSLT.

[17]  Yoshimasa Tsuruoka,et al.  Neural Machine Translation with Source-Side Latent Graph Parsing , 2017, EMNLP.

[18]  Yongxin Yang,et al.  Trace Norm Regularised Deep Multi-Task Learning , 2016, ICLR.

[19]  Jan Niehues,et al.  Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning , 2017, WMT.

[20]  Andrew McCallum,et al.  End-to-End Learning for Structured Prediction Energy Networks , 2017, ICML.

[21]  Razvan Pascanu,et al.  Understanding the exploding gradient problem , 2012, ArXiv.

[22]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[23]  Tiejun Zhao,et al.  Forest-Based Neural Machine Translation , 2018, ACL.

[24]  Andy Way,et al.  Using Images to Improve Machine-Translating E-Commerce Product Listings. , 2017, EACL.

[25]  Zhiguo Wang,et al.  Coverage Embedding Models for Neural Machine Translation , 2016, EMNLP.

[26]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[27]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[28]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[29]  Caroline Brun,et al.  Motivating Personality-aware Machine Translation , 2015, EMNLP.

[30]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[31]  Min-Yen Kan,et al.  Perspectives on crowdsourcing annotations for natural language processing , 2012, Language Resources and Evaluation.

[32]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[33]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[34]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[35]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[36]  James Henderson,et al.  Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.

[37]  Yoav Goldberg,et al.  Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[38]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[39]  Ivo D. Dinov,et al.  Deep learning for neural networks , 2018 .

[40]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[41]  Lemao Liu,et al.  Neural Machine Translation with Source Dependency Representation , 2017, EMNLP.

[42]  Gonzalo Iglesias,et al.  Neural Machine Translation Decoding with Terminology Constraints , 2018, NAACL.

[43]  Yoshua Bengio,et al.  A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[44]  Yann Dauphin,et al.  A Convolutional Encoder Model for Neural Machine Translation , 2016, ACL.

[45]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[46]  Stefan Riezler,et al.  Analyzing Parallelism and Domain Similarities in the MAREC Patent Corpus , 2012, IRFC.

[47]  Ryan Cotterell,et al.  Explaining and Generalizing Back-Translation through Wake-Sleep , 2018, ArXiv.

[48]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[49]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[50]  Graham Neubig,et al.  Extreme Adaptation for Personalized Neural Machine Translation , 2018, ACL.

[51]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Quoc V. Le,et al.  Neural Optimizer Search with Reinforcement Learning , 2017, ICML.

[53]  Akihiro Tamura,et al.  Neural Machine Translation Incorporating Named Entity , 2018, COLING.

[54]  Huanbo Luan,et al.  Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization , 2017, ACL.

[55]  Matt Post,et al.  Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation , 2018, NAACL.

[56]  Trevor Darrell,et al.  Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[57]  Kevin Duh,et al.  DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[58]  Rongrong Ji,et al.  Asynchronous Bidirectional Decoding for Neural Machine Translation , 2018, AAAI.

[59]  Alex Graves,et al.  Grid Long Short-Term Memory , 2015, ICLR.

[60]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[61]  Xin Rong,et al.  word2vec Parameter Learning Explained , 2014, ArXiv.

[62]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[63]  Xavier Carreras,et al.  Exponentiated gradient algorithms for log-linear structured prediction , 2007, ICML '07.

[64]  Richard Socher,et al.  An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.

[65]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[66]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[67]  Nick Campbell,et al.  Doubly-Attentive Decoder for Multi-modal Neural Machine Translation , 2017, ACL.

[68]  Marcello Federico,et al.  Report on the 10th IWSLT evaluation campaign , 2013, IWSLT.

[69]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[70]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[71]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[72]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[73]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[74]  Elizabeth Salesky,et al.  The AFRL-MITLL WMT17 Systems: Old, New, Borrowed, BLEU , 2017, WMT.

[75]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[76]  Jindrich Libovický,et al.  End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification , 2018, EMNLP.

[77]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[78]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[79]  Kenneth Heafield,et al.  Copied Monolingual Data Improves Low-Resource Neural Machine Translation , 2017, WMT.

[80]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[81]  Phil Blunsom,et al.  Multilingual Distributed Representations without Word Alignment , 2013, ICLR 2014.

[82]  Markus Freitag,et al.  Fast Domain Adaptation for Neural Machine Translation , 2016, ArXiv.

[83]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[84]  Lijun Wu,et al.  A Study of Reinforcement Learning for Neural Machine Translation , 2018, EMNLP.

[85]  Ben Taskar,et al.  Learning Tractable Word Alignment Models with Complex Constraints , 2010, CL.

[86]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[87]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[88]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[89]  Satoshi Nakamura,et al.  Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015 , 2015, WAT.

[90]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[91]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[92]  Oriol Vinyals,et al.  Learning Implicit Generative Models with the Method of Learned Moments , 2018, ICML.

[93]  Satoshi Nakamura,et al.  Guiding Neural Machine Translation with Retrieved Translation Pieces , 2018, NAACL.

[94]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[95]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[96]  Sergei Nirenburg,et al.  Knowledge-based machine translation , 1989, COLING.

[97]  Victor O. K. Li,et al.  Trainable Greedy Decoding for Neural Machine Translation , 2017, EMNLP.

[98]  Stéphane Dupont,et al.  An empirical study on the effectiveness of images in Multimodal Neural Machine Translation , 2017, EMNLP.

[99]  Yun Chen,et al.  A Stable and Effective Learning Strategy for Trainable Greedy Decoding , 2018, EMNLP.

[100]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[101]  Kyunghyun Cho,et al.  Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model , 2016, ArXiv.

[102]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search for Improved Description of Complex Scenes , 2018, AAAI.

[103]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[104]  Chris Dyer,et al.  Differentiable Scheduled Sampling for Credit Assignment , 2017, ACL.

[105]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[106]  Gholamreza Haffari,et al.  Neural Machine Translation for Bilingually Scarce Scenarios: a Deep Multi-Task Learning Approach , 2018, NAACL.

[107]  Ruslan Salakhutdinov,et al.  Multimodal Neural Language Models , 2014, ICML.

[108]  Wenhu Chen,et al.  Guided Alignment Training for Topic-Aware Neural Machine Translation , 2016, AMTA.

[109]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[110]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[111]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[112]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[113]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[114]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[115]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[116]  Rico Sennrich,et al.  The University of Edinburgh’s Neural MT Systems for WMT17 , 2017, WMT.

[117]  Gholamreza Haffari,et al.  Document Context Neural Machine Translation with Memory Networks , 2017, ACL.

[118]  Rico Sennrich,et al.  Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation , 2018, EMNLP.

[119]  Chris Callison-Burch,et al.  Learning Translations via Images with a Massively Multilingual Image Dataset , 2018, ACL.

[120]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[121]  Kenneth Heafield,et al.  Multi-Source Syntactic Neural Machine Translation , 2018, EMNLP.

[122]  Lucia Specia,et al.  Personalized Machine Translation: Preserving Original Author Traits , 2016, EACL.

[123]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[124]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[125]  Nenghai Yu,et al.  Deliberation Networks: Sequence Generation Beyond One-Pass Decoding , 2017, NIPS.

[126]  Ming Zhou,et al.  Sequence-to-Dependency Neural Machine Translation , 2017, ACL.

[127]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[128]  Vesa Siivola,et al.  Growing an n-gram language model , 2005, INTERSPEECH.

[129]  Shujian Huang,et al.  Combining Character and Word Information in Neural Machine Translation Using a Multi-Level Attention , 2018, NAACL.

[130]  Rico Sennrich,et al.  Controlling Politeness in Neural Machine Translation via Side Constraints , 2016, NAACL.

[131]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[132]  Luke S. Zettlemoyer,et al.  Neural Semantic Parsing , 2018, ACL.

[133]  Yang Liu,et al.  Coverage-based Neural Machine Translation , 2016, ArXiv.

[134]  Andrei Popescu-Belis,et al.  Self-Attentive Residual Decoder for Neural Machine Translation , 2017, NAACL.

[135]  Tobias Domhan,et al.  How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures , 2018, ACL.

[136]  Shi Feng,et al.  Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model , 2016, ArXiv.

[137]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[138]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[139]  Masashi Toyoda,et al.  A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size , 2017, WAT@IJCNLP.

[140]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[141]  Laura Jehl,et al.  Document-Level Information as Side Constraints for Improved Neural Patent Translation , 2018, AMTA.

[142]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[143]  Regina Barzilay,et al.  Multi-Event Extraction Guided by Global Constraints , 2012, NAACL.

[144]  Victor O. K. Li,et al.  Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[145]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[146]  Daniel Jurafsky,et al.  A Simple, Fast Diverse Decoding Algorithm for Neural Generation , 2016, ArXiv.

[147]  Andy Way,et al.  Learning to Jointly Translate and Predict Dropped Pronouns with a Shared Reconstruction Mechanism , 2018, EMNLP.

[148]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[149]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[150]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[151]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[152]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[153]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[154]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[155]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[156]  Bolyai János Matematikai Társulat,et al.  Theory of algorithms , 1985 .

[157]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[158]  Ondrej Bojar,et al.  Improving Translation Model by Monolingual Data , 2011, WMT@EMNLP.

[159]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[160]  David Chiang,et al.  Correcting Length Bias in Neural Machine Translation , 2018, WMT.

[161]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[162]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[163]  Mohamed Chtourou,et al.  On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.

[164]  Rico Sennrich,et al.  Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.

[165]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[166]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[167]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[168]  Stefan Riezler,et al.  Structural and Topical Dimensions in Multi-Task Patent Translation , 2012, EACL.

[169]  Qun Liu,et al.  Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[170]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[171]  Andrew McCallum,et al.  Alternating Projections for Learning with Expectation Constraints , 2009, UAI.

[172]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[173]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[174]  Eliyahu Kiperwasser,et al.  Scheduled Multi-Task Learning: From Syntax to Translation , 2018, TACL.

[175]  Sander M. Bohte,et al.  Editorial: Artificial Neural Networks as Models of Neural Information Processing , 2017, Front. Comput. Neurosci..

[176]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[177]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[178]  G. Evans,et al.  Learning to Optimize , 2008 .

[179]  Christof Monz,et al.  Data Augmentation for Low-Resource Neural Machine Translation , 2017, ACL.

[180]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[181]  Ben Taskar,et al.  Posterior vs Parameter Sparsity in Latent Variable Models , 2009, NIPS.

[182]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[183]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[184]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[185]  Naftali Tishby,et al.  Incorporating Prior Knowledge on Features into Learning , 2007, AISTATS.

[186]  Xiaocheng Feng,et al.  Adaptive Multi-pass Decoder for Neural Machine Translation , 2018, EMNLP.

[187]  Melvin Johnson,et al.  Direct speech-to-speech translation with a sequence-to-sequence model , 2019, INTERSPEECH.

[188]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[189]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[190]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.

[191]  Kilian Q. Weinberger,et al.  Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.

[192]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[193]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[194]  Christopher Joseph Pal,et al.  Twin Networks: Using the Future as a Regularizer , 2017, ArXiv.

[195]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[196]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[197]  Mingbo Ma,et al.  Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation , 2018, EMNLP.

[198]  Brenden M. Lake,et al.  Learning Inductive Biases with Simple Neural Networks , 2018, CogSci.

[199]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[200]  Rico Sennrich,et al.  Deep architectures for Neural Machine Translation , 2017, WMT.

[201]  Matt Post,et al.  The Sockeye Neural Machine Translation Toolkit at AMTA 2018 , 2018, AMTA.

[202]  Yang Liu,et al.  Learning to Remember Translation History with a Continuous Cache , 2017, TACL.

[203]  Qun Liu,et al.  Incorporating Global Visual Features into Attention-based Neural Machine Translation. , 2017, EMNLP.

[204]  Ekaterina Vylomova,et al.  Depth-Gated LSTM , 2015, ArXiv.

[205]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[206]  Emanuel Snelleman Decoding neural machine translation using gradient descent , 2016 .

[207]  Laurent Besacier,et al.  Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction , 2018, CoNLL.

[208]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[209]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[210]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[211]  Taro Watanabe,et al.  Bidirectional Decoding for Statistical Machine Translation , 2002, COLING.

[212]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[213]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[214]  Wilker Aziz,et al.  A Stochastic Decoder for Neural Machine Translation , 2018, ACL.

[215]  James H. Martin,et al.  Speech and Language Processing, 2nd Edition , 2008 .

[216]  Kevin Knight,et al.  Multi-Source Neural Translation , 2016, NAACL.

[217]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[218]  Guodong Zhou,et al.  Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches , 2017, COLING.

[219]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[220]  Fethi Bougares,et al.  Neural Machine Translation by Generating Multiple Linguistic Factors , 2017, SLSP.

[221]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[222]  Andrew McCallum,et al.  Structured Prediction Energy Networks , 2015, ICML.

[223]  John Hutchins Example-based machine translation: a review and commentary , 2006, Machine Translation.

[224]  Gholamreza Haffari,et al.  Incorporating Syntactic Uncertainty in Neural Machine Translation with a Forest-to-Sequence Model , 2017, COLING.

[225]  Frank Hutter,et al.  CMA-ES for Hyperparameter Optimization of Deep Neural Networks , 2016, ArXiv.

[226]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[227]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[228]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[229]  Hermann Ney,et al.  Improving Statistical Machine Translation with Word Class Models , 2013, EMNLP.

[230]  Daniel Jurafsky,et al.  Mutual Information and Diverse Decoding Improve Neural Machine Translation , 2016, ArXiv.

[231]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[232]  Daniel Jurafsky,et al.  A Hierarchical Neural Autoencoder for Paragraphs and Documents , 2015, ACL.

[233]  Jason Lee,et al.  Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.

[234]  Erhardt Barth,et al.  Recurrent Dropout without Memory Loss , 2016, COLING.

[235]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[236]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[237]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[238]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[239]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[240]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[241]  Tommi S. Jaakkola,et al.  Approximate inference in graphical models using lp relaxations , 2010 .

[242]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[243]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[244]  Ankur Bapna,et al.  Training Deeper Neural Machine Translation Models with Transparent Attention , 2018, EMNLP.

[245]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[246]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[247]  Zhiguo Wang,et al.  Supervised Attentions for Neural Machine Translation , 2016, EMNLP.

[248]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[249]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[250]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[251]  Marc Dymetman,et al.  A surprisingly effective out-of-the-box char2char model on the E2E NLG Challenge dataset , 2017, SIGDIAL Conference.

[252]  Gholamreza Haffari,et al.  Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation , 2018, ACL.

[253]  Gholamreza Haffari,et al.  Sequence to Sequence Mixture Model for Diverse Machine Translation , 2018, CoNLL.

[254]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[255]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[256]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[257]  Eiichiro Sumita,et al.  Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation , 2007, ACL.

[258]  Taku Kudo,et al.  Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[259]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[260]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[261]  Anthony Rousseau,et al.  XenC: An Open-Source Tool for Data Selection in Natural Language Processing , 2013, Prague Bull. Math. Linguistics.

[262]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[263]  Tara N. Sainath,et al.  Deep Neural Network Language Models , 2012, WLM@NAACL-HLT.

[264]  Christof Monz,et al.  Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation , 2018, EMNLP.

[265]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[266]  Deyi Xiong,et al.  Accelerating Neural Transformer via an Average Attention Network , 2018, ACL.

[267]  Jeff A. Bilmes,et al.  Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[268]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[269]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[270]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[271]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[272]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[273]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[274]  Jason Lee,et al.  Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.

[275]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[276]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[277]  Philip Resnik,et al.  Word-Based Alignment, Phrase-Based Translation: What’s the Link? , 2006, AMTA.

[278]  Shuming Shi,et al.  Translating Pro-Drop Languages with Reconstruction Models , 2018, AAAI.

[279]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[280]  Di He,et al.  Decoding with Value Networks for Neural Machine Translation , 2017, NIPS.

[281]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[282]  Mark J. F. Gales,et al.  Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition , 2017, INTERSPEECH.

[283]  Stefan Riezler,et al.  Multi-Task Learning for Improved Discriminative Training in SMT , 2013, WMT@ACL.

[284]  Rui Wang,et al.  A Survey of Domain Adaptation for Neural Machine Translation , 2018, COLING.

[285]  Yann Le Cun,et al.  A Theoretical Framework for Back-Propagation , 1988 .

[286]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[287]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[288]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[289]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[290]  Marc Dymetman,et al.  Learning Machine Translation , 2010 .

[291]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[292]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[293]  Tie-Yan Liu,et al.  Multi-Agent Dual Learning , 2019, ICLR.

[294]  André F. T. Martins,et al.  Marian: Fast Neural Machine Translation in C++ , 2018, ACL.

[295]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[296]  Mirella Lapata,et al.  Neural Summarization by Extracting Sentences and Words , 2016, ACL.

[297]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[298]  Trevor Cohn,et al.  Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.

[299]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.