Cross-lingual Link Discovery Based on CRF Model for NTCIR-10 CrossLink

This paper described our participation in the NTCIR-10 Cross-lingual Link Discovery Task of Chinese-to-English(C2E). The task focuses on making sutiable links on terms between Chinese/Japanese/Korean lingual Wikipedia articles and English Wikipedia articles. In this event, we proposed a method on Chinese-to-English subtask. The method that we proposed have two stage. We divides this task into “Anchor Recognition’’and “CrossLink’’. The first one, we use conditional random field in machine learning method to recognize every potential anchors which could be linking to a article in target language. The second, we try to find candidate links of these anchors and then doing disambiguous with them. According to the offical result, our system achieved LMAP score 0.072 when evaluating with Wikipedia ground-truth, and 0.027 with manual assessment.