DCU at NTCIR-10 Cross-lingual Link Discovery (CrossLink-2) Task

DCU participated in the English to Chinese (C2E) and Chinese to English (C2E) subtasks of the NTCIR 10 CrossLink-2 Cross-lingual Link Discovery (CLLD) task. Our strategy for each query involved extracting potential link anchors as n-gram strings, cleaning of potential anchor strings, and anchor expansion and ranking to select a set of anchors for the query. Potential anchors were translated using Google Translate, and a standard information retrieval technique used to create links between anchors and the top 5 ranked retrieved items selected as potential links for each anchor. We submitted a total of four runs for E2C CLLD and C2E CLLD. We describe our method and results for file-to-file level and anchor-to-file level evaluation.