A Methodology for a Semi-Automatic Evaluation of the Lexicons of Machine Translation Systems

The lexicon is a major part of any Machine Translation (MT) system. If the lexicon of an MT system is not adequate, this will affect the quality of the whole system. Building a comprehensive lexicon, i.e., one with a high lexical coverage, is a major activity in the process of developing a good MT system. As such, the evaluation of the lexicon of an MT system is clearly a pivotal issue for the process of evaluating MT systems. In this paper, we introduce a new methodology that was devised to enable developers and users of MT Systems to evaluate their lexicons semi-automatically. This new methodology is based on the idea of the importance of a specific word or, more precisely, word sense, to a given application domain. This importance, or weight, determines how the presence of such a word in, or its absence from, the lexicon affects the MT system's lexical quality, which in turn will naturally affect the overall output quality. The method, which adopts a black-box approach to evaluation, was implemented and applied to evaluating the lexicons of three commercialEnglish–Arabic MT systems. A specific domain was chosen in which the various word-sense weights were determined by feeding sample texts from the domain into a system developed specifically for that purpose. Once this database of word senses and weights was built, test suites were presented to each of the MT systems under evaluation and their output rated by a human operator as either correct or incorrect. Based on this rating, an overall automated evaluation of the lexicons of the systems was deduced.

[1]  George R. Klare,et al.  Further Experiments in Language Translation: A Second Evaluation of the Readability of Computer Translations, , 1973 .

[2]  John Lehrberger,et al.  Machine Translation: Linguistic characteristics of MT systems and general methodology of evaluation , 1988 .

[3]  Georges Van Slype,et al.  Evaluation du systeme de traduction automatique SYSTRAN anglais-Francais, version 1978, de la Commission des Communautes Europeennes (Evaluation of the English-French SYSTRAN Machine Translation System, 19 version, of the Commission of European Communities). , 1979 .

[4]  Margaret King,et al.  Using Test Suites in Evaluation of Machine Translation Systems , 1990, COLING.

[5]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[6]  De Vasconcellos,et al.  Technology as translation strategy , 1988 .

[7]  Lorna Balkan,et al.  TSNLP - Test Suites for Natural Language Processing , 1996, COLING.

[8]  Jaime G. Carbonell,et al.  Evaluation Metrics for Knowledge-Based Machine Translation , 1994, COLING.

[9]  Alan K. Melby Lexical transfer: between a source rock and a hard target , 1988, COLING.

[10]  H. Wallace Sinaiko,et al.  Further experiments in language translation : readability of computer translations , 1972 .

[11]  Teruko Mitamura,et al.  The KANT System: Fast, Accurate, High-Quality Translation in Practical Domains , 1992, COLING.

[12]  Klaus Netter,et al.  DiET in the context of MT evaluation , 1998 .

[13]  John S. White,et al.  The ARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches , 1994, AMTA.

[14]  Mary C. Dyson,et al.  Toward a methodology for the evaluation of machine-assisted translation systems , 2005, Computers and translation.

[15]  Judith L. Klavans,et al.  Introduction: Special issue on building lexicons for machine translation , 2005, Machine Translation.