Combining off-the-shelf components to clean a translation memory

We present a system to identify erroneous entries in a translation memory. It is a machine learning system that learns to classify entries according to either a strict or a permissive view on correctness. It is trained on features relating to segment length, translation quality checks, spelling and grammar errors, and additionally uses external data for detecting problems with fluency and lexical choice.

[1]  Geoff Dougherty,et al.  Pattern Recognition and Classification , 2013, Springer New York.

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[4]  O'Brien Sharon,et al.  EYE‐TRACKING AND TRANSLATION MEMORY MATCHES , 2007 .

[5]  Raivis Skadins,et al.  Word Alignment Based Parallel Corpora Evaluation and Cleaning Using Machine Learning Techniques , 2015, EAMT.

[6]  Elina Lagoudaki,et al.  Translation Memories Survey 2006 , 2006 .

[7]  Elina Lagoudaki,et al.  Translation Memories Survey 2006: User’s Perceptions Around TM Usage , 2006, TC.

[8]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[9]  Marcin Milkowski,et al.  Developing an open‐source, rule‐based proofreading tool , 2010, Softw. Pract. Exp..

[10]  Jörg Tiedemann,et al.  Bitext Alignment , 2011, Synthesis Lectures on Human Language Technologies.

[11]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[12]  Lucia Specia,et al.  Multi-level Translation Quality Prediction with QuEst++ , 2015, ACL.

[13]  Zoran Bosnić,et al.  Extending applications using an advanced approach to DLL injection and API hooking , 2010 .

[14]  Eduard Barbu Spotting false translation segments in translation memories , 2015 .

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Geoff Dougherty,et al.  Pattern Recognition and Classification: An Introduction , 2012 .

[17]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.