论文信息 - The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks (Hochreiter and Schmidhuber, 1997) are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as bioinformatics (Min et al., 2016). Recent advances in contextual word embeddings like BERT (Devlin et al., 2019) boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand. At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements that tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This thesis can be understood as an antithesis to the prevailing paradigm. We show how traditional symbolic statistical machine translation (Koehn, 2009) models can still improve neural machine translation (Kalchbrenner and Blunsom, 2013; Sutskever et al., 2014; Bahdanau et al., 2015, NMT) while reducing the risk of common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural models to correct grammatical errors in text.

Felix Stahlberg | Felix Stahlberg

[1] Gonzalo Iglesias,et al. Transducer Disambiguation with Sparse Topological Features , 2015, EMNLP.

[2] Ming Zhou,et al. Bilingual Data Cleaning for SMT using Graph-based Random Walk , 2013, ACL.

[3] Marcin Junczys-Dowmunt,et al. Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions , 2016, IWSLT.

[4] Hwee Tou Ng,et al. The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[5] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[6] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7] Tomoki Toda,et al. Ckylark: A More Robust PCFG-LA Parser , 2015, HLT-NAACL.

[8] Shi Feng,et al. Pathologies of Neural Models Make Interpretations Difficult , 2018, EMNLP.

[9] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[10] Wei Wu,et al. Phrase-level Self-Attention Networks for Universal Sentence Encoding , 2018, EMNLP.

[11] H. Ng,et al. A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction , 2018, AAAI.

[12] Yong Cheng,et al. Neural Machine Translation with Key-Value Memory-Augmented Attention , 2018, IJCAI.

[13] Adrià de Gispert,et al. Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices , 2016, EACL.

[14] Hermann Ney,et al. Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models , 2005, HLT.

[15] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[16] Marc'Aurelio Ranzato,et al. Analyzing Uncertainty in Neural Machine Translation , 2018, ICML.

[17] Holger Schwenk,et al. Investigations on large-scale lightly-supervised training for statistical machine translation. , 2008, IWSLT.

[18] Jingbo Zhu,et al. Towards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation , 2017, EMNLP.

[19] Jacob Devlin,et al. Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU , 2017, EMNLP.

[20] Yoshua Bengio,et al. On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[21] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[22] Joelle Pineau,et al. Language GANs Falling Short , 2018, ICLR.

[23] Angela Fan,et al. Controllable Abstractive Summarization , 2017, NMT@ACL.

[24] Kevin Duh,et al. On the Elements of an Accurate Tree-to-String Machine Translation System , 2014, ACL.

[25] Rico Sennrich,et al. Regularization techniques for fine-tuning in neural machine translation , 2017, EMNLP.

[26] Graham Neubig,et al. A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models , 2017, AAAI.

[27] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[28] Holger Schwenk,et al. N-gram-based machine translation enhanced with neural networks , 2010, IWSLT.

[29] Yang Liu,et al. Recursive Autoencoders for ITG-Based Translation , 2013, EMNLP.

[30] Rico Sennrich,et al. A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation , 2018, WMT.

[31] Mingbo Ma,et al. When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size) , 2017, EMNLP.

[32] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[33] Ondrej Bojar,et al. Training Tips for the Transformer Model , 2018, Prague Bull. Math. Linguistics.

[34] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.

[35] Stephan Vogel,et al. The QCRI Recognition System for Handwritten Arabic , 2015, ICIAP.

[36] Shahram Khadivi,et al. Parallel Corpus Refinement as an Outlier Detection Algorithm , 2011, MTSUMMIT.

[37] Petr Motlícek,et al. Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition , 2012, INTERSPEECH.

[38] Xing Li,et al. STACL: Simultaneous Translation with Integrated Anticipation and Controllable Latency , 2018, ArXiv.

[39] Dan Klein,et al. When and why are log-linear models self-normalizing? , 2015, NAACL.

[40] Di He,et al. Non-Autoregressive Machine Translation with Auxiliary Regularization , 2019, AAAI.

[41] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[42] Stephen Clark,et al. Syntax-Based Word Ordering Incorporating a Large-Scale Language Model , 2012, EACL.

[43] Shahram Khadivi,et al. Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search , 2017, EMNLP.

[44] Zhiguo Wang,et al. Coverage Embedding Models for Neural Machine Translation , 2016, EMNLP.

[45] Alexander M. Rush,et al. Adapting Sequence Models for Sentence Correction , 2017, EMNLP.

[46] Mark Fishel,et al. Multi-Domain Neural Machine Translation , 2018, EAMT.

[47] Yang Liu,et al. Modeling Coverage for Neural Machine Translation , 2016, ACL.

[48] Min Zhang,et al. Variational Neural Machine Translation , 2016, EMNLP.

[49] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[50] Yongqiang Wang,et al. Efficient lattice rescoring using recurrent neural network language models , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[51] Diego Marcheggiani,et al. Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks , 2018, NAACL.

[52] Khalil Sima'an,et al. A Shared Task on Multimodal Machine Translation and Crosslingual Image Description , 2016, WMT.

[53] Toshiaki Nakazawa,et al. ASPEC: Asian Scientific Paper Excerpt Corpus , 2016, LREC.

[54] Oleksandr Makeyev,et al. Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[55] John DeNero,et al. Variable-Length Word Encodings for Neural Translation Models , 2015, EMNLP.

[56] Satoshi Nakamura,et al. Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[57] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[58] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[59] Tommi S. Jaakkola,et al. A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.

[60] Ole Winther,et al. Neural Machine Translation with Characters and Hierarchical Encoding , 2016, ArXiv.

[61] Kevin Knight,et al. Augmenting Statistical Machine Translation with Subword Translation of Out-of-Vocabulary Words , 2018, ArXiv.

[62] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[63] Chengqi Zhang,et al. Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling , 2018, IJCAI.

[64] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.

[65] Tiejun Zhao,et al. Forest-Based Neural Machine Translation , 2018, ACL.

[66] Thomas Fang Zheng,et al. Enhanced neural machine translation by learning from draft , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[67] Pushpak Bhattacharyya,et al. Learning variable length units for SMT between related languages via Byte Pair Encoding , 2016, SWCN@EMNLP.

[68] Heike Adel,et al. Exploring Different Dimensions of Attention for Uncertainty Detection , 2016, EACL.

[69] Rongrong Ji,et al. Asynchronous Bidirectional Decoding for Neural Machine Translation , 2018, AAAI.

[70] W. Byrne,et al. Generalization and maximum likelihood from small data sets , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[71] Yonatan Belinkov,et al. What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models , 2018, AAAI.

[72] Desmond Elliott,et al. Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description , 2017, WMT.

[73] T. Kathirvalavakumar,et al. Pruning algorithms of neural networks — a comparative study , 2013, Central European Journal of Computer Science.

[74] Yue Zhang,et al. Transition-Based Syntactic Linearization , 2015, NAACL.

[75] Pushpak Bhattacharyya,et al. Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages , 2018, NAACL.

[76] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.

[77] Pontus Stenetorp,et al. Transition-based Dependency Parsing Using Recursive Neural Networks , 2013 .

[78] Ming Zhou,et al. Reinforced Mnemonic Reader for Machine Reading Comprehension , 2017, IJCAI.

[79] Graham Neubig,et al. Learning to Translate in Real-time with Neural Machine Translation , 2016, EACL.

[80] Tomas Mikolov,et al. Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[81] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[82] Philipp Koehn,et al. An Analysis of Source Context Dependency in Neural Machine Translation , 2018, EAMT.

[83] Veselin Stoyanov,et al. Simple Fusion: Return of the Language Model , 2018, WMT.

[84] Yang Liu,et al. Joint Training for Pivot-based Neural Machine Translation , 2016, IJCAI.

[85] Bowen Zhou,et al. Pointing the Unknown Words , 2016, ACL.

[86] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[87] Jörg Tiedemann,et al. Neural machine translation for low-resource languages , 2017, ArXiv.

[88] Yoav Goldberg,et al. Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[89] Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.

[90] Jeffrey Pennington,et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[91] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.

[92] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[93] Kenneth Heafield,et al. Fast Neural Machine Translation Implementation , 2018, NMT@ACL.

[94] Nadir Durrani,et al. A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[95] Yue Zhang,et al. Transition-Based Syntactic Linearization with Lookahead Features , 2016, HLT-NAACL.

[96] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[97] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[98] Shankar Kumar,et al. Local Phrase Reordering Models for Statistical Machine Translation , 2005, HLT.

[99] Xu Sun,et al. Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation , 2018, EMNLP.

[100] Taro Watanabe,et al. Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection , 2018, WMT.

[101] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[102] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[103] Jan Niehues,et al. Pre-Translation for Neural Machine Translation , 2016, COLING.

[104] Samuel R. Bowman,et al. Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.

[105] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[106] Yann Dauphin,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.

[107] David Chiang,et al. A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[108] Jindrich Libovický,et al. End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification , 2018, EMNLP.

[109] Markus Freitag,et al. Attention-based Vocabulary Selection for NMT Decoding , 2017, ArXiv.

[110] Young-Kil Kim,et al. Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages , 2017, LREC.

[111] Guillaume Lample,et al. Word Translation Without Parallel Data , 2017, ICLR.

[112] M. J. D. Powell,et al. An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[113] Deyi Xiong,et al. A GRU-Gated Attention Model for Neural Machine Translation , 2017, ArXiv.

[114] Yoshua Bengio,et al. Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning , 2017, Rep4NLP@ACL.

[115] Yonatan Belinkov,et al. NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks , 2018, AAAI.

[116] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[117] Rich Caruana,et al. Model compression , 2006, KDD '06.

[118] Jerome R. Bellegarda,et al. A latent semantic analysis framework for large-Span language modeling , 1997, EUROSPEECH.

[119] Yun Chen,et al. A Stable and Effective Learning Strategy for Trainable Greedy Decoding , 2018, EMNLP.

[120] Christopher D. Manning,et al. Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[121] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.

[122] David Chiang,et al. Correcting Length Bias in Neural Machine Translation , 2018, WMT.

[123] Marine Carpuat,et al. Detecting Cross-Lingual Semantic Divergence for Neural Machine Translation , 2017, NMT@ACL.

[124] Leonid Sigal,et al. Middle-Out Decoding , 2018, NeurIPS.

[125] Chenhui Chu,et al. A Survey of Multilingual Neural Machine Translation , 2019, ACM Comput. Surv..

[126] James Henderson,et al. Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.

[127] Desmond Elliott,et al. Multilingual Image Description with Neural Sequence Models , 2015, 1510.04709.

[128] Lemao Liu,et al. Neural Machine Translation with Source Dependency Representation , 2017, EMNLP.

[129] Hua Wu,et al. Multi-channel Encoder for Neural Machine Translation , 2017, AAAI.

[130] Hans Uszkoreit,et al. Deeper Machine Translation and Evaluation for German , 2016, DMTW.

[131] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[132] Yoshua Bengio,et al. Audio Chord Recognition with Recurrent Neural Networks , 2013, ISMIR.

[133] Holger Schwenk,et al. Continuous Space Language Models for Statistical Machine Translation , 2006, ACL.

[134] Christopher Kermorvant,et al. The A2iA Arabic Handwritten Text Recognition System at the Open HaRT2013 Evaluation , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[135] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[136] Bill Byrne,et al. Domain Adaptive Inference for Neural Machine Translation , 2019, ACL.

[137] Graham Neubig,et al. MTNT: A Testbed for Machine Translation of Noisy Text , 2018, EMNLP.

[138] Sunita Sarawagi,et al. Length bias in Encoder Decoder Models and a Case for Global Conditioning , 2016, EMNLP.

[139] Huda Khayrallah,et al. Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation , 2018, WMT.

[140] Colin Raffel,et al. Learning Hard Alignments with Variational Inference , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[141] Jonathan G. Fiscus,et al. Overview of the NIST 2016 LoReHLT evaluation , 2017, Machine Translation.

[142] Hermann Ney,et al. Statistical multi-source translation , 2001, MTSUMMIT.

[143] Huda Khayrallah,et al. On the Impact of Various Types of Noise on Neural Machine Translation , 2018, NMT@ACL.

[144] Rico Sennrich,et al. Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[145] Andy Way,et al. Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation , 2018, WMT.

[146] Jianfeng Gao,et al. Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[147] Bill Byrne,et al. Neural Grammatical Error Correction with Finite State Transducers , 2019, NAACL.

[148] Hang Li,et al. Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[149] William J. Byrne,et al. Hierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-n Grammars , 2010, CL.

[150] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[151] Mariana L. Neves,et al. The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine , 2016, LREC.

[152] Tetsuji Nakagawa,et al. An Empirical Study of Language Relatedness for Transfer Learning in Neural Machine Translation , 2017, PACLIC.

[153] Andy Way,et al. Neural Pre-Translation for Hybrid Machine Translation , 2017, MTSUMMIT.

[154] Yang Liu,et al. Towards Robust Neural Machine Translation , 2018, ACL.

[155] Yoshua Bengio,et al. A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[156] Alexandre Allauzen,et al. Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[157] Yann Dauphin,et al. A Convolutional Encoder Model for Neural Machine Translation , 2016, ACL.

[158] Bill Byrne,et al. Unfolding and Shrinking Neural Machine Translation Ensembles , 2017, EMNLP.

[159] C. Lee Giles,et al. The Neural Network Pushdown Automaton: Model, Stack and Learning Simulations , 2017, ArXiv.

[160] Naoaki Okazaki,et al. Positional Encoding to Control Output Sequence Length , 2019, NAACL.

[161] Atsushi Fujita,et al. Enhancement of Encoder and Attention Using Target Monolingual Corpora in Neural Machine Translation , 2018, NMT@ACL.

[162] Gholamreza Haffari,et al. Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[163] Yonatan Belinkov,et al. Identifying and Controlling Important Neurons in Neural Machine Translation , 2018, ICLR.

[164] Di He,et al. Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input , 2018, AAAI.

[165] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[166] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[167] Adrià de Gispert,et al. The University of Cambridge’s Machine Translation Systems for WMT18 , 2018, WMT.

[168] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[169] Jungi Kim,et al. Boosting Neural Machine Translation , 2016, IJCNLP.

[170] Joan Bruna,et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[171] Daniel Jurafsky,et al. Mutual Information and Diverse Decoding Improve Neural Machine Translation , 2016, ArXiv.

[172] Geoffrey E. Hinton,et al. Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[173] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[174] Yang Liu,et al. Neural Machine Translation with Reconstruction , 2016, AAAI.

[175] Tara N. Sainath,et al. Learning compact recurrent neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[176] Yoshua Bengio,et al. Better Mixing via Deep Representations , 2012, ICML.

[177] Joakim Nivre,et al. An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation , 2018, WMT.

[178] Hermann Ney,et al. On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation , 2018, WMT.

[179] Brian Roark,et al. The OpenGrm open-source finite-state grammar software libraries , 2012, ACL.

[180] Karin M. Verspoor,et al. Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[181] Quoc V. Le,et al. The Evolved Transformer , 2019, ICML.

[182] Guillaume Lample,et al. What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[183] Byunghan Lee,et al. Deep learning in bioinformatics , 2016, Briefings Bioinform..

[184] Mikko Kurimo,et al. Minimum Bayes Risk Combination of Translation Hypotheses from Alternative Morphological Decompositions , 2009, NAACL.

[185] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[186] Marcin Junczys-Dowmunt,et al. Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction , 2016, EMNLP.

[187] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.