Disambiguation of single noun translations extracted from bilingual comparable corpora

Bilingual machine readable dictionaries are important and indispensable resources of information for cross-language information retrieval, and machine translation. Recently, these cross-language informational activities have begun to focus on specific academic or technological domains. In this paper, we describe a bilingual dictionary acquisition system which extracts translations from non-parallel but comparable corpora of a specific academic domain and disambiguates the extracted translations. The proposed method is two-fold. At the first stage, candidate terms are extracted from a Japanese and English corpus, respectively, and ranked according to their importance as terms. At the second stage, ambiguous translations are resolved by selecting the target language translation which is the nearest in rank to the source language term. Finally, we evaluate the proposed method in an experiment.

[1]  Nigel Collier,et al.  Machine Translation vs. Dictionary Term Translation - a Comparison for English-Japanese News Article Alignment , 1998, ACL.

[2]  Kathleen McKeown,et al.  Automatically Extracting and Representing Collocations for Language Generation , 1990, ACL.

[3]  Satoru Ikehara,et al.  Learning Bilingual Collocations by Word-Level Sorting , 1996, COLING.

[4]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[5]  Yoshihiko Nitta,et al.  Analysis of Japanese compound nouns by direct text scanning , 1996, COLING 1996.

[6]  Sophia Ananiadou,et al.  Trucks: a model for automatic multiword term recognition , 2001 .

[7]  Pascale Fung,et al.  A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora , 1995, ACL.

[8]  Sophia Ananiadou,et al.  Extracting Nested Collocations , 1996, COLING.

[9]  Hiroshi Nakagawa Extraction of Index Words from Manuals , 1997, RIAO.

[10]  Kumiko Tanaka-Ishii,et al.  Extraction of Lexical Translations from Non-Aligned Corpora , 1996, COLING.

[11]  Chantal Enguehard,et al.  Automatic Natural Acquisition of a Terminology , 1995, J. Quant. Linguistics.

[12]  Kenneth Ward Church,et al.  Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[13]  Kyo Kageura,et al.  Automatic Thesaurus Generation through Multiple Filtering , 2000, COLING.

[14]  Pascale Fung,et al.  Compiling Bilingual Lexicon Entries From a Non-Parallel English-Chinese Corpus , 1995, VLC@ACL.

[15]  Yoshihiko Nitta,et al.  Analysis of Japanese Compound Nouns by Direct Text Scanning , 1996, COLING.

[16]  Sophia Ananiadou,et al.  A Methodology for Automatic Term Recognition , 1994, COLING.

[17]  Pascale Fung,et al.  An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.

[18]  Donna K. Harman,et al.  Ranking Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[19]  Sophia Ananiadou,et al.  The C-value/NC-value domain-independent method for multi-word term extraction , 1999 .

[20]  Julian Kupiec,et al.  An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora , 1993, ACL.

[21]  Jun'ichi Tsujii,et al.  A Method of Measuring Term Representativeness - Baseline Method Using Co-occurrence Distribution , 2000, COLING.

[22]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[23]  Kyo Kageura,et al.  METHODS OF AUTOMATIC TERM RECOGNITION : A REVIEW , 1996 .

[24]  Makoto Iwayama,et al.  Term Extraction Using A New Measure of Term Representativeness , 1999, NTCIR.

[25]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[26]  Éric Gaussier,et al.  Towards Automatic Extraction of Monolingual and Bilingual Terminology , 1994, COLING.

[27]  Noriko Kando,et al.  NTCIR workshop : proceedings of the first NTCIR workshop on research in Japanese text retrieval and term recognition , 1999 .

[28]  ChengXiang Zhai,et al.  Noun-Phrase Analysis in Unrestricted Text for Information Retrieval , 1996, ACL.

[29]  Pascale Fung,et al.  A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora , 1998, AMTA.

[30]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.

[31]  Sayori Shimohata,et al.  Retrieving Collocations by Co-Occurrences and Word Order Constraints , 1997, ACL.