Adaptation of Cross-Lingual Transfer Methods for the Building of Medical Terminology in Ukrainian

An increasing availability of parallel bilingual corpora and of automatic methods and tools makes it possible to build linguistic and terminological resources for low-resourced languages. We propose to exploit corpora available in several languages for building bilingual and trilingual terminologies. Typically, terminology information extracted in better resourced languages is associated with the corresponding units in lower-resourced languages thanks to the multilingual transfer. The method is applied on corpora involving Ukrainian language. According to the experiments, precision of term extraction varies between 0.454 and 0.966, while the quality of the interlingual relations varies between 0.309 and 0.965. The resource built contains 4,588 medical terms in Ukrainian and their 34,267 relations with French and English terms.

[1]  Jan Snajder,et al.  TermeX: A Tool for Collocation Extraction , 2009, CICLing.

[2]  Orest Kossak Ukrainian computer terminology (abstract only) , 2000, CCU '00.

[3]  Fiammetta Namer FLEMM : Un analyseur flexionnel du français à base de règles , 2000 .

[4]  Brämer Gr International statistical classification of diseases and related health problems. Tenth revision. , 1988, World health statistics quarterly. Rapport trimestriel de statistiques sanitaires mondiales.

[5]  Thierry Hamon,et al.  Improving Term Extraction with Terminological Resources , 2006, FinTAL.

[6]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[7]  Horacio Rodríguez,et al.  Arabic medical terms compilation from Wikipedia , 2014, 2014 Third IEEE International Colloquium in Information Science and Technology (CIST).

[8]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[9]  Philip Resnik,et al.  Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[10]  M. Teresa Cabré Castellví,et al.  Automatic term detection: A review of current systems , 2001 .

[11]  Katherine E Henson,et al.  Risk of Suicide After Cancer Diagnosis in England , 2018, JAMA psychiatry.

[12]  Kyo Kageura,et al.  METHODS OF AUTOMATIC TERM RECOGNITION : A REVIEW , 1996 .

[13]  Jörg Tiedemann,et al.  A Discriminative Approach to Tree Alignment , 2009 .

[14]  Fabio Massimo Zanzotto,et al.  Terminology Extraction: An Analysis of Linguistic and Statistical Approaches , 2005 .

[15]  Philip Resnik,et al.  Word-level Alignment for Multilingual Resource Acquisition , 2002 .

[16]  Andrius Utka,et al.  Experiments on Lithuanian Term Extraction , 2011, NODALIDA.

[17]  Nikola Ljubešić,et al.  Term Extraction, Tagging, and Mapping Tools for Under-Resourced Languages , 2012 .

[18]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[19]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[20]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[21]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.