Machine Translation Supported by Terminological Information

It is well-known that natural language (NL) is highly complex and ambiguous, and designing a system in the sense of 'large scale engineering' rather than in the sense of so-called 'runnable specifications', i.e. computational solutions to pre-selected NL problem areas, which could cope with most complexities of NL, seems not to be feasible in the foreseeable future. Nevertheless, there is a widespread recognition that systems designed for specific purposes are far more likely to be viable. However, in this context the discipline of computational terminology has received little attention in computational linguistics; an unfortunate situation given that natural language processing (NLP) systems seem to be most successful when applied to specialised domains. In this paper we present an approach that integrates an instance of computational terminology into a constraint-based NLP/MT environment. Parts of this research have been carried out in the context of the ET-10/66 project 'Terminology and Extra-linguistic Knowledge' financed by the Commission of the European Communities (CEC). Like in this project we have chosen the subject field telecommunications as the domain of reference, and the text corpus on which the work is based is the Handbook on Satellite Communication of the International Radio Consultative Committee (CCIR); this corpus is an expository type of text. The problems to be solved through our approach, and which are characteristic for sublanguage texts, relate to multiword term identification, domain-specific attachment of prepositional phrases and the disambiguation of lexical ambiguities. The terminology knowledge used in our project for constructing a terminology knowledge base was partly extracted from the information encoded in the EIRETERM term bank, that is designed primarily for human users, and is based on a linguistically motivated statistical analysis of the reference corpus.