Word-Pair Extraction for Lexicography

We describe an application of sentence alignment techniques and approximate string matching to the problem of extracting lexicographically interesting word-word pairs from multilingual corpora. Since our interest is in support systems for lexicographers rather than in fully automatic construction of lexicons, we would like to provide access to parameters allowing a tunable trade-oo between precision and recall. We evaluate two techniques for doing this. Since sentence alignment tends to associate semantically similar words, approximate string matching draws attention to orthographic similarities, they can be used to serve diierent lexicographic purposes, as can the combination of the two techniques, which amounts, inter alia, to a tool for uncovering faux amis. We conclude by sketching a simple and exible means for allowing lexicographers to provide information which has the potential to improve system performance.