Using bilingual ETD collections to mine phrase translations

Phrase translation lists can enhance cross-language information retrieval. However, finding translations for technical phrases is difficult. Bilingual dictionaries have limited coverage for specialized fields, and even more limited coverage of technical phrases. Since phrases can have very specific meanings in technical fields, this limits the quality of translations produced by generic machine translation systems. We hypothesize that digital libraries of electronic theses and dissertations (ETDs) are a good source of technical phrase translations. We have acquired a collection of 3,086 Spanish ETDs about computer science from Scirus, the Universidad Nacional Autónoma de México (Mexico City), and Universidad de las Américas (Puebla). By using English ETDs from NDLTD, we have a comparable corpus of computing-related documents from which to mine phrase translations. We describe our method and its formative evaluation.