Combining machine readable lexical resources and bilingual corpora for broad word sense disambiguation

This paper describes a new approach to word sense disambiguation (WSD) based on automatically acquired "word sense division. The semantically related sense entries in a bilingual dictionary are arranged in clusters using a heuristic labeling algorithm to provide a more complete and appropriate sense division for WSD. Multiple translations of senses serve as outside information for automatic tagging of bilingual corpora and acquisition of WSD rules. We describe and implement a WSD method using the English-Chinese bilingual version (LecDOCE) of the Longman Dictionary of Contemporary English (LDOCE). For this purpose, we draw on information about topics and topical sets in the Longman Lexicon of Contemporary English (LLOCE) to represent and disambiguate LecDOCE senses. Example sentences and their translations from LecDOCE are employed as training materials for WSD, while further examples from the Brown corpus are used for testing. Quantitative results of disambiguating 12 words are also presented.