论文信息 - A Neural Network Architecture for Detecting Grammatical Errors in Statistical Machine Translation

A Neural Network Architecture for Detecting Grammatical Errors in Statistical Machine Translation

Abstract In this paper we present a Neural Network (NN) architecture for detecting grammatical errors in Statistical Machine Translation (SMT) using monolingual morpho-syntactic word representations in combination with surface and syntactic context windows. We test our approach on two language pairs and two tasks, namely detecting grammatical errors and predicting overall post-editing effort. Our results show that this approach is not only able to accurately detect grammatical errors but it also performs well as a quality estimation system for predicting overall post-editing effort, which is characterised by all types of MT errors. Furthermore, we show that this approach is portable to other languages.

[1] Véronique Hoste,et al. Detecting Grammatical Errors in Machine Translation Output Using Dependency Parsing and Treebank Querying , 2016, EAMT.

[2] Arianna Bisazza,et al. Neural versus Phrase-Based Machine Translation Quality: a Case Study , 2016, EMNLP.

[3] Hermann Ney,et al. LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition , 2016, INTERSPEECH.

[4] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[5] George F. Foster,et al. Confidence estimation for translation prediction , 2003, CoNLL.

[6] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[7] Joakim Nivre,et al. A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012, EMNLP.

[8] Karin M. Verspoor,et al. Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[9] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[10] Sara Stymne,et al. Using a Grammar Checker for Evaluation and Postprocessing of Statistical Machine Translation , 2010, LREC.

[11] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12] M. Sasikumar,et al. Translation Quality Estimation using Recurrent Neural Network , 2016, WMT.

[13] van Gerardus Noord. TALN06. Verbum Ex Machina. Actes de la 13e conference sur le traitement automatique des langues naturelles. , 2006 .

[14] Wei-Yun Ma,et al. Detecting and Correcting Syntactic Errors in Machine Translation Using Feature-Based Lexicalized Tree Adjoining Grammars , 2012, ROCLING/IJCLCLP.

[15] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[16] Veronique Hoste,et al. SCATE Taxonomy and Corpus of Machine Translation Errors , 2016 .

[17] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[18] Stefan Riezler,et al. QUality Estimation from ScraTCH (QUETCH): Deep Learning for Word-level Translation Quality Estimation , 2015, WMT@EMNLP.

[19] Orphée De Clercq,et al. Dutch Parallel Corpus: A Balanced Copyright-Cleared Parallel Corpus , 2011 .

[20] Gertjan van Noord,et al. At Last Parsing Is Now Operational , 2006, JEPTALNRECITAL.

[21] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[22] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[23] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24] Nelleke Oostdijk,et al. From D-Coi to SoNaR: a reference corpus for Dutch , 2008, LREC.

[25] Yang Liu,et al. Exploiting Unlabeled Data for Neural Grammatical Error Detection , 2016, Journal of Computer Science and Technology.