Although WordNets have been developed for a number of languages, no attempts to construct a Japanese WordNet have been known to exist. Taking this into account, we launched a project to automatically translate the Princeton WordNet into Japanese by a method of unsupervised word-sense disambiguation using bilingual comparable corpora. The method we propose aligns English word associations with those in Japanese and iteratively calculates a correlation matrix of Japanese translations of an English word versus its associated words. It then determines the Japanese translation for the English word in a synset by calculating scores for translation candidates according to the correlation matrix and the associated words appearing in the gloss appended to the synset. This method is not robust because a gloss only contains a few associated words. To overcome this difficulty, we extended the method so that it retrieves texts by using the gloss as a query and uses the retrieved texts as well as the gloss to calculate scores for translation candidates. A preliminary experiment using Wall Street Journal and Nihon Keizai Shimbun corpora demonstrated that the proposed method is promising for constructing a Japanese WordNet.
[1]
Piek Vossen,et al.
EuroWordNet: A multilingual database with lexical semantic networks
,
1998,
Springer Netherlands.
[2]
Hiroyuki Kaji,et al.
Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora
,
2002,
IEICE Trans. Inf. Syst..
[3]
Reinhard Rapp,et al.
Identifying Word Translations in Non-Parallel Texts
,
1995,
ACL.
[4]
Horacio Rodríguez,et al.
Using WordNet for Building WordNets
,
1998,
WordNet@ACL/COLING.
[5]
Toshio Yokoi,et al.
The EDR electronic dictionary
,
1995,
CACM.
[6]
George A. Miller,et al.
Introduction to WordNet: An On-line Lexical Database
,
1990
.
[7]
Hiroyuki Kaji.
Adapting a Bilingual Dictionary to Domains
,
2005,
IEICE Trans. Inf. Syst..
[8]
Kenneth Ward Church,et al.
Word Association Norms, Mutual Information, and Lexicography
,
1989,
ACL.