Bilingual Dictionary Drafting. The Example of German-Basque, a Medium-density Language Pair

This paper presents a set of Bilingual Dictionary Drafting (BDD) methods including manual extraction from existing lexical databases and corpus based NLP tools, as well as their evaluation on the example of German-Basque as language pair. Our aim is twofold: to give support to a German-Basque bilingual dictionary project by providing draft Bilingual Glossaries and to provide lexicographers with insight into how useful BDD methods are. Results show that the analysed methods can greatly assist on bilingual dictionary writing, in the context of medium-density language pairs.

[1]  Kumiko Tanaka-Ishii,et al.  Construction of a Bilingual Dictionary Intermediated by a Third Language , 1994, COLING.

[2]  Helmut Feldweg,et al.  GermaNet - a Lexical-Semantic Net for German , 1997 .

[3]  Jean Véronis,et al.  Parallel text processing :alignment and use of translationcorpora , 2000 .

[4]  Philip Resnik,et al.  The Bible as a Parallel Corpus: Annotating the ‘Book of 2000 Tongues’ , 1999, Comput. Humanit..

[5]  Eneko Agirre,et al.  Methodology and construction of the Basque WordNet , 2011, Lang. Resour. Evaluation.

[6]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[7]  J. M. Arriola,et al.  Combining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages , 1998, ACL.

[8]  Rogelio Nazar Bifid: un alineador de corpus paralelo a nivel de documento, oración y vocabulario , 2012, Linguamática.

[9]  Kenneth Ward Church,et al.  Identifying word correspondence in parallel texts , 1991 .

[10]  Kenneth Ward Church,et al.  Identifying Word Correspondences in Parallel Texts , 1991, HLT.

[11]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[12]  Xabier Saralegi,et al.  Analyzing Methods for Improving Precision of Pivot Based Bilingual Dictionaries , 2011, EMNLP.

[13]  Ahmet Aker,et al.  Correlation between Similarity Measures for Inter-Language Linked Wikipedia Articles , 2012, LREC.

[14]  András Kornai,et al.  Parallel corpora for medium density languages , 2007 .

[15]  Jean V ronis Parallel Text Processing: Alignment and Use of Translation Corpora , 2002 .

[16]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[17]  Philip Resnik,et al.  Mining the Web for Bilingual Text , 1999, ACL.

[18]  Robert L. Mercer,et al.  Aligning Sentences in Parallel Corpora , 1991, ACL.

[19]  David Lindemann Zweisprachige Lexikographie des Sprachenpaares Deutsch-Baskisch , 2014 .

[20]  Naroa Zubillaga Gómez Alemanetik euskaratutako haur- eta gazte-literatura: zuzeneko nahiz zeharkako itzulpenen azterketa corpus baten bidez , 2014 .

[21]  Iñaki San Vicente,et al.  Automatic Extraction of Bilingual Terms from Comparable Corpora in a Popular Science Domain , 2008 .

[22]  Yves Lepage,et al.  Bilingual Lexicon Induction: Effortless Evaluation of Word Alignment Tools and Production of Resources for Improbable Language Pairs , 2010, LREC.

[23]  Pascale Fung,et al.  Compiling Bilingual Lexicon Entries From a Non-Parallel English-Chinese Corpus , 1995, VLC@ACL.

[24]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[25]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[26]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.