Topic Modeling-based Domain Adaptation for System Combination

This paper gives the system description of the domain adaptation team of Dublin City University for our participation in the system combination task in the Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-12). We used the results of unsupervised document classification as meta information to the system combination module. For the Spanish-English data, our strategy achieved 26.33 BLEU points, 0.33 BLEU points absolute improvement over the standard confusion-network-based system combination. This was the best score in terms of BLEU among six participants in ML4HMT-12.

[1]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[2]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[3]  Andy Way,et al.  An Incremental Three-pass System Combination Framework by Combining Multiple Hypothesis Alignment Methods , 2010, Int. J. Asian Lang. Process..

[4]  Andy Way,et al.  Given Bilingual Terminology in Statistical Machine Translation: MWE-Sensitve Word Alignment and Hierarchical Pitman-Yor Process-Based Translation Model Smoothing , 2011, FLAIRS.

[5]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[6]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[7]  Tsuyoshi Okita,et al.  Annotated Corpora for Word Alignment between Japanese and English and its Evaluation with MAP-based Word Aligner , 2012, LREC.

[8]  Andy Way,et al.  Pitman-Yor Process-Based Language Models for Machine Translation , 2011, Int. J. Asian Lang. Process..

[9]  Andy Way,et al.  Using TERp to Augment the System Combination for SMT , 2010, AMTA.

[10]  Antonio Toral,et al.  DELiC4MT: A Tool for Diagnostic MT Evaluation over User-defined Linguistic Phenomena , 2012, Prague Bull. Math. Linguistics.

[11]  Daniel M. Roy,et al.  Complexity of Inference in Latent Dirichlet Allocation , 2011, NIPS.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[14]  Jörg Tiedemann,et al.  Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache , 2010, ACL 2010.

[15]  Andy Way,et al.  Multi-Word Expression-Sensitive Word Alignment , 2010 .

[16]  Andy Way,et al.  Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation , 2010, WAPA.

[17]  Josef van Genabith,et al.  Minimum Bayes Risk Decoding with Enlarged Hypothesis Space in System Combination , 2012, CICLing.

[18]  Tsuyoshi Okita,et al.  Data Cleaning for Word Alignment , 2009, ACL.

[19]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[20]  Richard M. Schwartz,et al.  Improved Word-Level System Combination for Machine Translation , 2007, ACL.

[21]  Andy Way,et al.  Monolingual Data Optimisation for Bootstrapping SMT Engines , 2012, AMTA.

[22]  Andy Way,et al.  Hierarchical Pitman-Yor Language Model for Machine Translation , 2010, 2010 International Conference on Asian Language Processing.

[23]  Andy Way,et al.  A Framework for Diagnostic Evaluation of MT Based on Linguistic Checkpoints , 2011, MTSUMMIT.

[24]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[25]  David M. Blei,et al.  Introduction to Probabilistic Topic Models , 2010 .

[26]  Shankar Kumar,et al.  Minimum Bayes-Risk Word Alignments of Bilingual Texts , 2002, EMNLP.

[27]  Josef van Genabith,et al.  Domain Adaptation of Statistical Machine Translation using Web-Crawled Resources: A Case Study , 2012, EAMT.

[28]  Yanjun Ma,et al.  Low-resource machine translation using MATREX: the DCU machine translation system for IWSLT 2009 , 2009, IWSLT.