Automatic Translation of Biomedical Terms by Supervised Machine Learning

In this paper, we present a simple yet efficient automatic system to translate biomedical terms. It mainly relies on a machine learning approach able to infer rewriting rules from pair of terms in two languages. Given a new term, these rules are then used to transform the initial term into its translation. Since conflicting rules may produce different translations, we also use language modeling to single out the best candidate. We report experiments on different language pairs (including Czech, English, French, Italian, German, Portuguese, Spanish and even Russian); our approach yields good results (varying according to the considered languages) and outperforms existing ones for the French-English pair.

[1]  Kyo Kageura,et al.  Extracting French-Japanese Word Pairs from Bilingual Corpora based on Transliteration Rules , 2002, LREC.

[2]  Philippe Langlais,et al.  Translating Unknown Words by Analogical Learning , 2007, EMNLP.

[3]  Mark S. Tuttle,et al.  Concepts, Issues, and Standards. Current Status of the NLM's Umls Project: Using Meta-1-The 1st Version of the UMLS Metathesaurus , 1990 .

[4]  Pascale Fung,et al.  Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.

[5]  Kenneth Ward Church,et al.  Identifying Word Correspondences in Parallel Texts , 1991, HLT.

[6]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[7]  Vincent Claveau,et al.  Automatic Morphological Query Expansion Using Analogy-Based Machine Learning , 2007, ECIR.

[8]  Saso Dzeroski,et al.  DEPARTMENT OF INTELLIGENT SYSTEMS , 2019 .

[9]  Elmer V. Bernstam,et al.  A day in the life of PubMed: analysis of a typical day's query log. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[10]  Michael Carl,et al.  General-purpose statistical translation engine and domain specific texts: Would it work? , 2004 .

[11]  Stefan Schulz,et al.  Bootstrapping dictionaries for cross-language information retrieval , 2005, SIGIR '05.

[12]  Kenneth Ward Church,et al.  Identifying word correspondence in parallel texts , 1991 .

[13]  Jörg Tiedemann Word to word alignment strategies , 2004, COLING.

[14]  Stefan Schulz,et al.  Cognate Mapping - A Heuristic Strategy for the Semi-Supervised Acquisition of a Spanish Lexicon from a Portuguese Seed Lexicon , 2004, COLING.

[15]  Kemal Oflazer,et al.  Practical Bootstrapping of Morphological Analyzers , 1999, CoNLL.

[16]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[17]  Jean Véronis,et al.  Parallel Text Processing , 2000 .

[18]  Magnus Merkel,et al.  A knowledge-lite approach to word alignment , 2000 .

[19]  Pascale Fung,et al.  A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups , 2004, Machine Translation.

[20]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[21]  Yaser Al-Onaizan,et al.  Machine Transliteration of Names in Arabic Texts , 2002, SEMITIC@ACL.

[22]  Pierre Zweigenbaum,et al.  Defining Medical Words: Transposing Morphosemantic Analysis from French to English , 2007, MedInfo.

[23]  Christian Fluhr,et al.  Parallel text alignment using crosslingual information retrieval techniques , 2000 .

[24]  Eric Gaussier,et al.  Unsupervised learning of derivational morphology from inflectional lexicons , 1999 .

[25]  Leah S. Larkey,et al.  Statistical transliteration for english-arabic cross language information retrieval , 2003, CIKM '03.

[26]  Gregory Grefenstette,et al.  Automatic transliteration for Japanese-to-English text retrieval , 2003, SIGIR.

[27]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[28]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.