论文信息 - Learning Semantic Representations for the Phrase Translation Model

Learning Semantic Representations for the Phrase Translation Model

This paper presents a novel semantic-based phrase translation model. A pair of source and target phrases are projected into continuous-valued vector representations in a low-dimensional latent semantic space, where their translation score is computed by the distance between the pair in this new space. The projection is performed by a multi-layer neural network whose weights are learned on parallel training data. The learning is aimed to directly optimize the quality of end-to-end machine translation results. Experimental evaluation has been performed on two Europarl translation tasks, English-French and German-English. The results show that the new semantic-based phrase translation model significantly improves the performance of a state-of-the-art phrase-based statistical machine translation sys-tem, leading to a gain of 0.7-1.0 BLEU points.

[1] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[2] Sun-Yuan Kung,et al. Principal Component Neural Networks: Theory and Applications , 1996 .

[3] Michael L. Littman,et al. Automatic Cross-Language Retrieval Using Latent Semantic Indexing , 1997 .

[4] Thomas Hofmann,et al. Probabilistic latent semantic indexing , 1999, SIGIR '99.

[5] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[6] Nello Cristianini,et al. Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis , 2002, NIPS.

[7] Daniel Marcu,et al. A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[8] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[10] Léon Bottou,et al. Stochastic Learning , 2003, Advanced Lectures on Machine Learning.

[11] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[12] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13] Hermann Ney,et al. The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[14] Rafael E. Banchs,et al. Data Inferred Multi-word Expressions for Statistical Machine Translation , 2005 .

[15] Philipp Koehn,et al. Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[16] Ben Taskar,et al. An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[17] John DeNero,et al. Why Generative Phrase Models Underperform Surface Heuristics , 2006, WMT@HLT-NAACL.

[18] José A. R. Fonollosa,et al. Smooth Bilingual N-Gram Translation , 2007, EMNLP.

[19] Xiaodong He. Using Word-Dependent Transition Models in HMM-Based Word Alignment for Statistical Machine Translation , 2007, WMT@ACL.

[20] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[21] Philip Koehn,et al. Statistical Machine Translation , 2010, EAMT.

[22] Robert C. Moore,et al. Faster beam-search decoding for phrasal statistical machine translation , 2007, MTSUMMIT.

[23] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[24] Jianfeng Gao,et al. Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[25] Andrew McCallum,et al. Polylingual Topic Models , 2009, EMNLP.

[26] John C. Platt,et al. Translingual Document Representations from Discriminative Projections , 2010, EMNLP.

[27] Hermann Ney,et al. Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[28] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[29] Jason Weston,et al. Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[30] Ben Taskar,et al. Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[31] Geoffrey E. Hinton,et al. Discovering Binary Codes for Documents by Learning Deep Generative Models , 2011, Top. Cogn. Sci..

[32] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33] Richard M. Schwartz,et al. Expected BLEU Training for Graphs: BBN System Description for WMT11 System Combination Task , 2011, WMT@EMNLP.

[34] Jianfeng Gao,et al. Clickthrough-based latent semantic models for web search , 2011, SIGIR.

[35] Andrew Y. Ng,et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[36] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[37] John C. Platt,et al. Learning Discriminative Projections for Text Similarity Measures , 2011, CoNLL.

[38] Alexandre Allauzen,et al. Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[39] Li Deng,et al. Maximum Expected BLEU Training of Phrase and Lexicon Translation Models , 2012, ACL.

[40] Andrew Y. Ng,et al. Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[41] Dong Yu,et al. Scalable stacking and learning for building deep architectures , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[43] Geoffrey Zweig,et al. Combining Heterogeneous Models for Measuring Relational Similarity , 2013, NAACL.

[44] Geoffrey Zweig,et al. Joint Language and Translation Modeling with Recurrent Neural Networks , 2013, EMNLP.

[45] Jianfeng Gao,et al. Training MRF-Based Phrase Translation Models using Gradient Ascent , 2013, NAACL.

[46] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[47] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[48] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.