Word sense disambiguation in text-to-pictograph translation
暂无分享,去创建一个
We describe the implementation and evaluation of a word sense disambiguation (WSD) tool in a translation system that converts English text messages into sequences of pictographic images. The Text-to-Picto tool for Dutch, English, and Spanish is used on the online communication platform “WAI-NOT” by people who have trouble reading and writing. The translation system relies on WordNets, in which synsets are populated with pictographs. In the original system, many ambiguous words are translated into an incorrect pictograph, because the pictograph is linked to the wrong word sense. The WSD method required for our translation engine must work on general domain text and use WordNet sense inventories. We opted for the gloss-overlap, extended lesk algorithm as described by Banerjee and Pedersen (2002). During translation, each possible WordNet synset of every content word in the input sentence receives a disambiguation score. This score, alongside other parameters, is used in a path-finding algorithm to determine the optimal pictograph sequence during translation. This implementation approach is easily generalised to other sense labelling algorithms, such as an SVM-based WSD tool for Dutch (Izquierdo 2015). In evaluation of the translation output, an improvement over the baseline system without WSD was not obtained. However, we found that WSD works well for ambiguous words for which sufficient pictographs are linked in our lexical-pictorial database.