Automatically Creating a Large Number of New Bilingual Dictionaries

This paper proposes approaches to automatically create a large number of new bilingual dictionaries for low-resource languages, especially resource-poor and endangered languages, from a single input bilingual dictionary. Our algorithms produce translations of words in a source language to plentiful target languages using available Wordnets and a machine translator (MT). Since our approaches rely on just one input dictionary, available Wordnets and an MT, they are applicable to any bilingual dictionary as long as one of the two languages is English or has a Wordnet linked to the Princeton Wordnet. Starting with 5 available bilingual dictionaries, we create 48 new bilingual dictionaries. Of these, 30 pairs of languages are not supported by the popular MTs: Google and Bing.

[1]  István Varga,et al.  Bilingual dictionary generation for low-resourced language pairs , 2009, EMNLP.

[2]  Kentaro Ogura,et al.  Combining linguistic resources to create a machine-tractable Japanese-Malay dictionary , 2008, Lang. Resour. Evaluation.

[3]  Stephen A. McGuire,et al.  Introductory Statistics , 2007, Technometrics.

[4]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[5]  Benoît Sagot,et al.  Building a free French wordnet from multilingual resources , 2008 .

[6]  Kevin Knight,et al.  Building a Large-Scale Knowledge Base for Machine Translation , 1994, AAAI.

[7]  Jugal K. Kalita,et al.  Creating Reverse Bilingual Dictionaries , 2013, HLT-NAACL.

[8]  Mark Sanderson,et al.  Improving Cross Language Information Retrieval with Triangulated Translation. , 2001, SIGIR 2002.

[9]  Darja Fiser,et al.  Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages , 2011, TSD.

[10]  Francis Bond,et al.  Linking and Extending an Open Multilingual Wordnet , 2013, ACL.

[11]  Hwee Tou Ng,et al.  Improved Statistical Machine Translation for Resource-Poor Languages Using Related Resource-Rich Languages , 2009, EMNLP.

[12]  Eric Wehrli,et al.  Generating Bilingual Dictionaries by Transitivity , 2008, LREC.

[13]  Ralf D. Brown Automated Dictionary Extraction for “Knowledge-Free” Example-Based Translation , 2006 .

[14]  Anindya Datta,et al.  Building a Scalable Database-Driven Reverse Dictionary , 2013, IEEE Transactions on Knowledge and Data Engineering.

[15]  Mark Sanderson,et al.  Improving cross language retrieval with triangulated translation , 2001, SIGIR '01.

[16]  Enikö Héja Dictionary Building based on Parallel Corpora and Word Alignment , 2010 .

[17]  Pierre Zweigenbaum,et al.  Using WordNet and Semantic Similarity for Bilingual Terminology Mining from Comparable Corpora , 2013, BUCC@ACL.

[18]  Kumiko Tanaka-Ishii,et al.  Construction of a Bilingual Dictionary Intermediated by a Third Language , 1994, COLING.

[19]  Kentaro Ogura,et al.  Design and construction of a machine-tractable Japanese-Malay dictionary , 2001 .

[20]  Kisuh Ahn,et al.  Automatic Generation of Translation Dictionaries Using Intermediary Languages , 2006 .

[21]  Hitoshi Isahara,et al.  Development of the Japanese WordNet , 2008, LREC.

[22]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[23]  R. L. Winkler,et al.  Statistics : Probability, Inference and Decision , 1975 .

[24]  José Ramom Pichel Campos,et al.  Automatic Generation of Bilingual Dictionaries Using Intermediary Languages and Comparable Corpora , 2010, CICLing.

[25]  Oren Etzioni,et al.  Panlingual lexical translation via probabilistic inference , 2010, Artif. Intell..