Providing Morphological Information for SMT Using Neural Networks

Abstract Treating morphologically complex words (MCWs) as atomic units in translation would not yield a desirable result. Such words are complicated constituents with meaningful subunits. A complex word in a morphologically rich language (MRL) could be associated with a number of words or even a full sentence in a simpler language, which means the surface form of complex words should be accompanied with auxiliary morphological information in order to provide a precise translation and a better alignment. In this paper we follow this idea and propose two different methods to convey such information for statistical machine translation (SMT) models. In the first model we enrich factored SMT engines by introducing a new morphological factor which relies on subword-aware word embeddings. In the second model we focus on the language-modeling component. We explore a subword-level neural language model (NLM) to capture sequence-, word- and subword-level dependencies. Our NLM is able to approximate better scores for conditional word probabilities, so the decoder generates more fluent translations. We studied two languages Farsi and German in our experiments and observed significant improvements for both of them.

[1]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[2]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3]  Andy Way,et al.  Benchmarking SMT Performance for Farsi Using the TEP++ Corpus , 2015, EAMT.

[4]  Andy Way,et al.  Improving Phrase-Based SMT Using Cross-Granularity Embedding Similarity , 2016, EAMT.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Taro Watanabe,et al.  Recurrent Neural Networks for Word Alignment Model , 2014, ACL.

[7]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Andy Way,et al.  Enriching Phrase Tables for Statistical Machine Translation Using Mixed Embeddings , 2016, COLING.

[10]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[11]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[12]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[13]  Kevin Duh,et al.  Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation , 2013, ACL.

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[16]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[17]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[18]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[21]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[22]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[23]  Mikko Kurimo,et al.  Morfessor 2.0: Toolkit for statistical morphological segmentation , 2014, EACL.

[24]  Maosong Sun,et al.  A Neural Reordering Model for Phrase-based Translation , 2014, COLING.

[25]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[26]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[27]  Andy Way,et al.  Boosting Neural POS Tagger for Farsi Using Morphological Information , 2016, ACM Trans. Asian Low Resour. Lang. Inf. Process..