Recent developments in Machine Translation a review of the last five years

multilevel tree representations which combine syntactic, logical and semantic relationships), lexical transfer (lexical substitution with some structural changes), structural transfer (tree transduction), syntactic generation, morphological generation (trees to strings). The long-term aim of the GETA project is a multilingual system producing 'good enough' results, i.e. accepting the need for post-editing. The system is essentially, like Eurotra, a linguisticsoriented system; it does not claim to use any 'deep understanding' or 'intelligence', and hence no AItype explicit 'expertise' is incorporated in GETA-ARIANE although the possibility of grafting on an 'expert' error correction mechanism was investigated by Boitet and Gerber (1986). However, unlike other linguistics-based systems, Ariane extends translation analysis to sequences of several sentences or paragraphs, in order to deal with problems of anaphora and tense/aspect agreement. For practical production the system permits optional pre-editing, primarily the marking lexical ambiguities; post-editing can be done using the REVISION program developed for ARIANE-78. It is a mainframe batch system with no human interaction during processing. However, Zajac (1986) has investigated an interactive analysis module for GETA, somewhat on the lines of Tomita's research at Carnegie-Mellon (Tomita 1986). One important development has been the refinement of the theoretical basis, particularly the clarification of the distinction and the relationship between dynamic and static grammars in the system. Static grammars (or SCSG 'structural correspondence static grammars') record the correspondences between NL strings and their equivalent interface structures in a formalism which is neutral with respect to analysis and synthesis. The processes of analysis and generation are handled by 'dynamic grammars' written in appropriate 'special languages' (SLLPs or Special Languages for Linguistic Programming): ATEF for morphological analysis, ROBRA for structural analysis, structural transfer and syntactic generation, EXPANS for lexical transfer, and SYGMOR for morphological generation. (The distinction between 'static' and 'dynamic' grammars is now found in many advanced transfer systems; the GETA project has been a leading force in this theoretical development.) Equally important have been the improvements to the research environment, in tools for the development of systems, such as ATLAS for lexicographic work and VISULEX for viewing complex dictionary entries. Such tools are components of a 'linguistic workstation' for MT research (an idea also being developed by the Saarbrücken and the Kyoto groups, 15.and below). Within this environment the work of the Calliope project has taken place: the compilation of the static grammars for English and French during 1983-84, their corresponding dynamic grammars, and the substantial lexicographic work. The Grenoble group has always encouraged and supported other MT projects using GETA software, and thereby helped to train MT researchers. ARIANE is regarded above all as "an integrated programming environment" for the development and building of "a variety of linguistic models, in order to test the general multilingual design and the various facilities for lingware preparation..." The ARIANE software has been tested on an impressive range of languages, often in small-scale experiments (Vauquois and Boitet 1985/1988; Hutchins 1986: 247-8; Boitet 1987a), but sometimes in larger projects, e.g. the English-Malay project mentioned elsewhere in this survey. The largest GETA-ARIANE system has been for Russian-French translation, which built upon previous experience with CETA. Since 1983 this system has been extensively and regularly tested in an experimental 'translation unit'; large corpora of text have been translated, including some 200,000 running words during one 18-month period (Boitet 1987b). Another large-scale system was the German-French system developed by Guilbaud and Stahl, using the same generator programs as in the Russian-French system. Its principal features were the attention given to morphological derivation and inflection, and the restriction of structural analysis almost wholly to morphological and syntactic data, with little or no use of semantic information. The system has been described by Guilbaud (1984/1987), but there has been little development of the system since 1984 (Boitet 1987b). The most important practical application of a large-scale system has, however, been through GETA's involvement in the French national computer-assisted translation project (NCATP). Launched in November 1983 (after a preparatory stage in 1982-83, the ESOPE project), the Calliope project has been financed 50% from public funds (administered by the Agence d'Informatique) and 50% from private sources. One source has been B'VITAL, founded in 1984 by the Grenoble group, which is responsible for the machine-readable dictionaries and for the 'static grammars' (Joscelyne 1987). Another has been Sonovision, which was to provide the aeronautics terminology for the major French-English system Calliope-Aero. After a demonstration of a prototype of Calliope-Aero at Expolangues in February 1986, it was decided to develop also an English-French system for the translation of computer science and data processing materials, Calliope-Info. In addition to these MT systems, both batch systems, the project was also to produce a translator's workstation (Calliope-Revision, organised around a Bull Questar 400 microcomputer) for preparing and post-editing texts and for access to remote term banks and including OCR and desk-top publishing facilities. This was essential if the systems were to be fully integrated into an industrial documentation environment. However, given the expected delays there have been plans by SG2 (one of the backers) to develop a terminology aid with split-screen word processing, Calliope-Manuel. Whatever the commercial feasibility of the Calliope project, which came to a formal end in February 1987 (Boitet 1987a), the experience will no doubt be put to good use by the GETA project, in particular the experience of dealing with complex dictionaries and the type of scientific and technical sublanguage presented by aeronautics. Boitet (1986), for example, mentions the successful treatment of complex noun phrases (e.g. la jonction bloc frein et raccord de tuyauterie) and complex adjectival phrases (e.g. comprise entre les deux index noir). Other problems did not occur in the sublanguage and thus were put aside, e.g. interrogatives, relative clauses introduced by dont, imperatives, certain comparatives, nominal groups which do not only consist of nouns, and so forth. The NCATP has had other consequences. It stimulated the conversion of the ARIANE-85 to run on IBM PC AT (with a minimum 20MB hard disk), adequate for MT development but not for a production system. It also encouraged the writing of new software in a French dialect of LISP (Boitet 1986; Boitet 1987a, 1987b), with the aim of creating a fully multilingual system with a single 'special language' for processing strings and trees (TETHYS). Clearly, GETA has continued to advance the boundaries of MT research. 14. While GETA is the main MT research centre in France, there are other MT projects in Nancy and Poitiers. At Nancy, Chauché (1986; Rolf & Chauché 1986) continues his research, begun at Grenoble, on algorithms for tree manipulation which are suitable for MT systems. Tests of the algorithms have been applied to Spanish-French and Dutch-French experiments (in collaboration with Rolf of Nijmegen University). From Poitiers, Poesco (1986) reports a smallscale knowledge-based MT experiment for translating Rumanian texts on three dimensional geometry into French. The ATN parser produces a conceptual frame-slot representation from which the generator devises a 'plan' for producing TL output. The restricted language system TITUS, designed for multilingual treatment of abstracts in the textile industry, has expanded in its latest version TITUS IV (Ducrot 1985) in order to deal with a wider range of subjects and to allow somewhat freer expression of contents. As elsewhere, there is commercial interest in translators’ workstations: Cap Sogeti Innovations is proposing a "language engineering workshop", providing 'intelligent' language tools, a dedicated multilingual word processor, a natural language knowledge base, a technical summary writer, and a 'text analyzer' which will produce abstract meaning representations. Details are necessarily vague at present (Joscelyne 1987). Attitudes to MT in France are most likely to be changed by the provision of MT services on Minitel. The availability of Systran has already been mentioned (sect.1 above). Other services include a number of dictionaries and term banks: the Harrap French and English slang dictionary, the Dictionary of Industries, Normaterm (the term bank of the French standards organisation AFNOR), the DAICADIF lexicon for telecommunications, and (next year) FRANTEXT the historical dictionary Tresor de la Langue Française. 15. The largest and most long-established MT group in Germany is based at Saarbrücken. It began in the mid-1960's with research on Russian-German translation, sponsored from 1972 to 1986 by the Deutsche Forschungsgemeinschaft. The SUSY project expanded into a multilingual system, based on the transfer approach, with the source languages German, Russian, English, French, and Esperanto, and the target languages German, English and French. Detailed descriptions of the latest version SUSY II as at the end of 1984 are given by Maas (1984/1987) and by Blatt et al. (1985), and summarised by Hutchins (1986: 233-239). The most recent developments of MT research at Saarbrücken are to be found in Zimmermann et al. (1987). The most significant are the changes introduced into the basic design by the introduction of English as a SL (in SUSY-

[1]  Martin Kay,et al.  Functional Unification Grammar: A Formalism for Machine Translation , 1984, ACL.

[2]  Liana Popesco Limited context semantic translation from a single knowledge-base for a natural language and structuring metarules , 1986, Comput. Humanit..

[3]  Pierre Isabelle,et al.  TAUM-AVIATION: Its Technical Features and Some Experimental Results , 1985, Comput. Linguistics.

[4]  Fred Stentiford,et al.  A speech driven language translation system , 1987, ECST.

[5]  David D McDonald,et al.  Natural Language Generation: Complexities and Techniques, , 1986 .

[6]  Hiroyuki Kaji HICATS/JE: A Japanese-to-English Machine Translation System Based on Semantics , 1987, MTSUMMIT.

[7]  Jonathan Slocum,et al.  A Survey of Machine Translation: Its History, Current Status and Future Prospects , 1985, CL.

[8]  Klaus Schubert Metataxis: Contrastive Dependency Syntax for Machine Translation , 1987 .

[9]  Hirosato Nomura,et al.  Lexical-Functional Transfer: A Transfer Framework in a Machine Translation System Based on LFG , 1986, COLING.

[10]  T. Nishida,et al.  Machine translation: Japanese perspectives , 1985, TC.

[11]  Christian Rohrer Linguistic Bases for Machine Translation , 1986, COLING.

[12]  Vladimir Pericliev,et al.  Handling Syntactical Ambiguity in Machine Translation , 1984, ACL.

[13]  Muriel Vasconcellos,et al.  SPANAM and ENGSPAN: Machine Translation at the Pan American Health Organization , 1985, Comput. Linguistics.

[14]  P. C. Rolf,et al.  Machine translation and the SYGMART system , 1986, Comput. Humanit..

[15]  John Lehrberger,et al.  Machine Translation: Linguistic characteristics of MT systems and general methodology of evaluation , 1988 .

[16]  Annely Rothkegel,et al.  Pragmatics in Machine Translation , 1986, COLING.

[17]  Mike Rosner,et al.  The Design of the Kernel Architecture for the Eurotra Software , 1984, ACL.

[18]  Jaime G. Carbonell,et al.  Another Stride Towards Knowledge-Based Machine Translation , 1986, COLING.

[19]  Erwin Stegentritt,et al.  ASCOF - A Modular Multilevel System for French-German Translation , 1985, Comput. Linguistics.

[20]  André Schenk,et al.  Idioms in the Rosetta Machine Translation System , 1986, COLING.

[21]  Randall Sharp,et al.  A Parametric NL Translator , 1986, COLING.

[22]  Jun'ichi Tsujii,et al.  The Japanese Government Project for Machine Translation , 1985, Comput. Linguistics.

[23]  D. Arnold Eurotra: A European perspective on MT , 1986, Proceedings of the IEEE.

[24]  Koichiro Ishihara,et al.  A proper treatment of syntax and semantics in machine translation , 1984 .

[25]  Prospects in Machine Translation , 1987, MTSUMMIT.

[26]  Rémi Zajac SCSL: A linguistic specification language for MT , 1986, COLING.

[27]  Jaime G. Carbonell,et al.  Knowledge-Based Machine Translation, The CMU Approach , 1987 .

[28]  H. Uchida ATLAS: Fujitsu Machine Translation System , 1987, MTSUMMIT.

[29]  Pete Whitelock,et al.  Machine Translation as an Expert Task , 1985, TMI.

[30]  Franciska de Jong,et al.  Synonymy and Translation , 1987 .

[31]  Pierre Isabelle,et al.  Transfer and MT Modularity , 1986, COLING.

[32]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[33]  Tim Johnson,et al.  Natural Language Computing: The Commercial Applications , 1984, The Knowledge Engineering Review.

[34]  B. C. Papegaaij Word Expert Semantics: An Interlingual Knowledge-Based Approach , 1986 .

[35]  Károly Fábricz Particle Homonymy and Machine Translation , 1986, COLING.

[36]  Ch. Boitet Current projects at GETA on or about machine translation , 1989 .

[37]  Taijiro Tsutsumi A Prototype English-Japanese Machine Translation System for Translating IBM Computer Manuals , 1986, COLING.

[38]  Rika Yoshi A Robust Machine Translation System , 1986, Other Conferences.

[39]  J. Chauché Deduction Automatique et Systemes Transformationnels , 1986, COLING.

[40]  Loong Cheong Tong English-Malay Translation System: a Laboratory Prototype , 1986, COLING.

[41]  Alan K. Melby Lexical Transfer: A Missing Element in Linguistics Theories , 1986, COLING.

[42]  Haruya Matsumoto,et al.  PERSIS: A Natural-Language Analyzer for Persian , 1986 .

[43]  Tosiyasu L. Kunii,et al.  NARA: A Two-way Simultaneous Interpretation System between Korean and Japanese -A methodological study- , 1986, COLING.

[44]  Philip J. Hayes,et al.  Entity-Oriented Parsing , 1984, ACL.

[45]  Erich H. Steiner Generating Semantic Structures in EUROTRA-D , 1986, COLING.

[46]  Paul Schmidt Valency Theory in Stratificational MT-System , 1986, COLING.

[47]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[48]  E. Luctkens,et al.  A Prototype Machine Translation Based on Extracts from Data Processing Manuals , 1986, COLING.

[49]  Dietmar F. Rösner,et al.  Language Generation From Conceptual Structure: Synthesis Of German In A Japanese/German MT Project , 1984, COLING.

[50]  D. W. Barron Machine Translation , 1968, Nature.

[51]  Stuart M. Shieber,et al.  An Introduction to Unification-Based Approaches to Grammar , 1986, CSLI Lecture Notes.

[52]  Derek Lewis The development and progress of machine translation systems , 1985 .

[53]  Jaime G. Carbonell,et al.  Another Stride Towards Knowledge-Based Machine Translation: An Entity Oriented Approach , 1986 .

[54]  Christian Boitet,et al.  Automated Translation at Grenoble University , 1985, Comput. Linguistics.

[55]  SYSTRAN: A machine translation system to meet user needs , 1987, MTSUMMIT.

[56]  Tetsuya Ishikawa,et al.  Concept and Structure of Semantic Markers for Machine Translation in Mu-Project , 1986, COLING.

[57]  Sergei Nirenburg,et al.  Machine translation: theoretical and methodological issues , 1987 .

[58]  Harold L. Somers The need for MT-oriented versions of Case and Valency in MT , 1986, COLING.

[59]  Lisette Appelo,et al.  A Compositional Approach to the Translation of Temporal Expressions in the Rosetta System , 1986, COLING.

[60]  David R. Dowty,et al.  Introduction to Montague semantics , 1980 .

[61]  Margaret King,et al.  EUROTRA: A Multilingual System under Development , 1985, Comput. Linguistics.

[62]  Masaru Tomita Disambiguating grammatically ambiguous sentences by asking , 1984 .

[63]  Pete Whitelock,et al.  Strategies for Interactive Machine Translation: the experience and implications of the UMIST Japanese project , 1986, COLING.

[64]  Kazunori Muraki,et al.  Augmented Dependency Grammar: A Simple Interface between the Grammar Rule and the Knowledge , 1985, EACL.

[65]  Hirosato Nomura,et al.  Computer Environment for Meaning Structure Representation and Manipulation in Machine Translation System (Papers Presented at the International Conference on Information and Knowledge'87, on the Theme "Dissemination") , 1989 .

[66]  Hiroshi Uchida Fujitsu machine translation system: ATLAS , 1986, Future Gener. Comput. Syst..

[67]  R. E. Miller,et al.  Automated translation of German to English medical text. , 1986, The American journal of medicine.

[68]  Mike Rosner,et al.  The , T Framework in Eurotra: A Theoretically Committed Notation for MT , 1986, COLING.

[69]  Jun'ichi Tsujii,et al.  The Transfer Phase of the Mu Machine Translation System , 1986, COLING.

[70]  Heinz-Dirk Luckhardt Der Transfer in der maschinellen Sprachübersetzung , 1987 .