Word alignment and smoothing methods in statistical machine translation: Noise, prior knowledge and overfitting

This thesis discusses how to incorporate linguistic knowledge into an SMT system. Although one important category of linguistic knowledge is that obtained by a constituent / dependency parser, a POS / super tagger, and a morphological analyser, linguistic knowledge here includes larger domains than this: Multi-Word Expressions, Out-Of-Vocabulary words, paraphrases, lexical semantics (or non-literal translations), named-entities, coreferences, and transliterations. The first discussion is about word alignment where we propose a MWE-sensitive word aligner. The second discussion is about the smoothing methods for a language model and a translation model where we propose a hierarchical Pitman-Yor process-based smoothing method. The common grounds for these discussion are the examination of three exceptional cases from real-world data: the presence of noise, the availability of prior knowledge, and the problem of underfitting. Notable characteristics of this design are the careful usage of (Bayesian) priors in order that it can capture both frequent and linguistically important phenomena. This can be considered to provide one example to solve the problems of statistical models which often aim to learn from frequent examples only, and often overlook less frequent but linguistically important phenomena.

[1]  Philipp Koehn,et al.  Edinburgh System Descriptionfor the 2005 NIST MT Evaluation , 2005 .

[2]  Andreas Stolcke,et al.  Bayesian learning of probabilistic language models , 1994 .

[3]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[4]  Andy Way,et al.  MaTrEx: the DCU MT System for NTCIR-8 , 2010, NTCIR.

[5]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[6]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[7]  Tsuyoshi Okita,et al.  Data Cleaning for Word Alignment , 2009, ACL.

[8]  Hermann Ney,et al.  Algorithms for statistical translation of spoken language , 2000, IEEE Trans. Speech Audio Process..

[9]  Daniel Marcu,et al.  A Phrase-Based, Joint Probability for Statistical Machine Translation , 2002 .

[10]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[11]  Phil Blunsom,et al.  Discriminative Word Alignment with Conditional Random Fields , 2006, ACL.

[12]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[13]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[14]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[15]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[16]  H. Akaike A new look at the statistical model identification , 1974 .

[17]  Zoubin Ghahramani,et al.  An Introduction to Hidden Markov Models and Bayesian Networks , 2001, Int. J. Pattern Recognit. Artif. Intell..

[18]  James Breen A WWW Japanese Dictionary , 2000 .

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Hermann Ney,et al.  Improvements in Phrase-Based Statistical Machine Translation , 2004, NAACL.

[21]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[22]  Yanjun Ma,et al.  Low-resource machine translation using MATREX: the DCU machine translation system for IWSLT 2009 , 2009, IWSLT.

[23]  Smaranda Muresan,et al.  Generalizing Word Lattice Translation , 2008, ACL.

[24]  Julian Kupiec,et al.  An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora , 1993, ACL.

[25]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[26]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[27]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[28]  Chris Callison-Burch,et al.  Syntactic Constraints on Paraphrases Extracted from Parallel Corpora , 2008, EMNLP.

[29]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[30]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[31]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[32]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[33]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[34]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.