Cross-lingual Terminology Extraction for Translation Quality Estimation

We explore ways of identifying terms from monolingual texts and integrate them into investigating the contribution of terminology to translation quality.The researchers proposed a supervised learning method using common statistical measures for termhood and unithood as features to train classifiers for identifying terms in cross-domain and cross-language settings. On its basis, sequences of words from source texts (STs) and target texts (TTs) are aligned naively through a fuzzy matching mechanism for identifying the correctly translated term equivalents in student translations. Correlation analyses further show that normalized term occurrences in translations have weak linear relationship with translation quality in term of usefulness/transfer, terminology/style, idiomatic writing and target mechanics and nearand above-strong relationship with the overall translation quality. This method has demonstrated some reliability in automatically identifying terms in human translations. However, drawbacks in handling low frequency terms and term variations shall be dealt in the

[1]  Els Lefever,et al.  TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment. , 2013 .

[2]  I. Hațieganu,et al.  PEARSON VERSUS SPEARMAN, KENDALL'S TAU CORRELATION ANALYSIS ON STRUCTURE-ACTIVITY RELATIONSHIPS OF BIOLOGIC ACTIVE COMPOUNDS , 2005 .

[3]  Xavier Gómez Guinovart,et al.  Parallel corpus-based bilingual terminology extraction , 2009, TIA.

[4]  Sa-Kwang Song,et al.  Translation of technical terminologies between English and Korean based on textual big data , 2015, Softw. Pract. Exp..

[5]  Danushka Bollegala,et al.  A classification approach for detecting cross-lingual biomedical term translations , 2017, Nat. Lang. Eng..

[6]  Ahmet Aker,et al.  Extracting bilingual terms from the Web , 2015 .

[7]  Ran Xu,et al.  Evaluating Term Extraction Methods for Interpreters , 2014 .

[8]  Takahiro Hara,et al.  Improving the extraction of bilingual terminology from Wikipedia , 2009, TOMCCAP.

[9]  Jian Su,et al.  Exploring Deep Knowledge Resources in Biomedical Name Recognition , 2004, NLPBA/BioNLP.

[10]  Yu Yuan,et al.  MoBiL: A Hybrid Feature Set for Automatic Human Translation Quality Assessment , 2016, LREC.

[11]  Sophia Ananiadou,et al.  Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary , 2015, BMC Bioinformatics.

[12]  Béatrice Daille,et al.  TTC TermSuite - A UIMA Application for Multilingual Terminology Extraction from Comparable Corpora , 2011, IJCNLP.

[13]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[14]  Jie Gao,et al.  JATE 2.0: Java Automatic Term Extraction with Apache Solr , 2016, LREC.

[15]  Su Jian,et al.  Exploring Deep Knowledge Resources in Biomedical Name Recognition , 2004, NLPBA/BioNLP.

[16]  Louise Brunette,et al.  Towards a Terminology for Translation Quality Assessment , 2000 .

[17]  B. Karoubi Translation quality assessment demystified , 2016 .

[18]  Hans Uszkoreit,et al.  Multidimensional Quality Metrics (MQM) , 2014 .

[19]  Kara Warburton,et al.  Processing terminology for the translation pipeline. , 2013 .

[20]  Alina Secar Translation Evaluation-a State of the Art Survey , 2006 .

[21]  Ulrich Heid,et al.  TTC:terminology extraction, translation tools, comparable corpora , 2010, EAMT.

[22]  Yue Zhang,et al.  Supervised learning for robust term extraction , 2017, 2017 International Conference on Asian Language Processing (IALP).