Incorporating Syntactic Knowledge in Neural Quality Estimation for Machine Translation

Translation quality estimation aims at evaluating the machine translation output without references. State-of-the-art quality estimation methods based on neural networks have certain capability of implicitly learning the syntactic information from sentence-aligned parallel corpus. However, they still fail to capture the deep structural syntactic details of the sentences. This paper proposes a method that explicitly incorporates source syntax in neural quality estimation. Specifically, the parse trees of source sentences are linearized, and the sequence labels are combined with the source sequence through hierarchical encoding to obtain a more complete and deeper source encoding vector. The hidden relationships between the source syntactic structure and the translation quality are modeled to discover the syntactic errors in the translation. Experimental results on WMT17 quality estimation datasets show that the sentence-level Pearson correlation score and the word-level F1–mult score can both be improved by the syntactic knowledge.

[1]  Francisco Casacuberta,et al.  Dimensionality reduction methods for machine translation quality estimation , 2013, Machine Translation.

[2]  Shujian Huang,et al.  Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder , 2017, ACL.

[3]  Bo Li,et al.  Alibaba Submission for WMT18 Quality Estimation Task , 2018, WMT.

[4]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[5]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[6]  Guodong Zhou,et al.  Modeling Source Syntax for Neural Machine Translation , 2017, ACL.

[7]  Philipp Koehn,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[8]  Joachim Wagner,et al.  DCU-Symantec Submission for the WMT 2012 Quality Estimation Task , 2012, WMT@NAACL-HLT.

[9]  Lucia Specia,et al.  A Bayesian non-linear method for feature selection in machine translation quality estimation , 2015, Machine Translation.

[10]  Chris Hokamp,et al.  Ensembling Factored Neural Machine Translation Models for Automatic Post-Editing and Quality Estimation , 2017, WMT.

[11]  Jong-Hyeok Lee,et al.  Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation , 2017, WMT.

[12]  M. Sasikumar,et al.  Translation Quality Estimation using Recurrent Neural Network , 2016, WMT.

[13]  Jennifer Foster,et al.  Quality Estimation of English-French Machine Translation: A Detailed Study of the Role of Syntax , 2014, COLING.

[14]  Lucia Specia,et al.  QuEst - A translation quality estimation framework , 2013, ACL.

[15]  Kevin Knight,et al.  Multi-Source Neural Translation , 2016, NAACL.

[16]  Zhiming Chen,et al.  A Unified Neural Network for Quality Estimation of Machine Translation , 2018, IEICE Trans. Inf. Syst..

[17]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[18]  Jennifer Foster,et al.  Syntax and Semantics in Quality Estimation of Machine Translation , 2014, SSST@EMNLP.

[19]  Stefan Riezler,et al.  QUality Estimation from ScraTCH (QUETCH): Deep Learning for Word-level Translation Quality Estimation , 2015, WMT@EMNLP.

[20]  Anton Frolov,et al.  YSDA Participation in the WMT’16 Quality Estimation Shared Task , 2016, WMT.

[21]  Kenneth Heafield,et al.  Multi-Source Syntactic Neural Machine Translation , 2018, EMNLP.

[22]  Philipp Koehn,et al.  Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.

[23]  Lidia S. Chao,et al.  Quality Estimation for Machine Translation Using the Joint Method of Evaluation Criteria and Statistical Modeling , 2013, WMT@ACL.

[24]  Ramón Fernández Astudillo,et al.  Pushing the Limits of Translation Quality Estimation , 2017, TACL.

[25]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[26]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[27]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[28]  Jörg Tiedemann,et al.  Tree Kernels for Machine Translation Quality Estimation , 2012, WMT@NAACL-HLT.

[29]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.