In this paper, we describe our approach to the English-Korean Cross-Lingual Link Discovery (CLLD) task in NTCIR 9. We propose a simple and effective approach to discover the links. Our method comprises preprocessing steps, anchor-target link mapping, and the ranking steps. For discovering the links, we use the English anchor names, the inter-language links, and the translation by the Google Translate as features and extract the possible links with the exactly matching among them. Our method also ranks the anchor candidates by the Wikipedia category sets and the PageRank method, and we select the Korean target pages with the mutual information between English anchors and Korean titles of Wikipedia articles. The official file-to-file evaluation with the manual assessment of our system is achieved from 0.6 to 0.7 in P10 precision, which shows that our approach can achieve satisfactory results.
[1]
Andrew Trotman,et al.
Experiments and evaluation of link discovery in the Wikipedia
,
2008
.
[2]
W. Che,et al.
Experiments and Evaluation of Link Discovery in the Wikipedia
,
2008
.
[3]
Philipp Cimiano,et al.
Enriching the crosslingual link structure of Wikipedia - A classification-based approach
,
2008,
AAAI 2008.
[4]
Ian H. Witten,et al.
Learning to link with wikipedia
,
2008,
CIKM '08.
[6]
Andrew Trotman,et al.
Overview of the NTCIR-9 Crosslink Task: Cross-lingual Link Discovery
,
2011,
NTCIR.