Evaluation of machine translation and translation tools

While there is general agreement about the basic features of machine translation (MT) evaluation (as reflected in general introductory texts Lehrberger & Bourbeau, 1988; Hutchins & Somers, 1992; Arnold et al., 1994), there are no universally accepted and reliable methods and measures, and evaluation methodology has been the subject of much discussion in recent years (e.g. Arnold et al., 1993; Falkedal, 1994, AMTA, 1992).

[1]  George R. Klare,et al.  Further Experiments in Language Translation: A Second Evaluation of the Readability of Computer Translations, , 1973 .

[2]  Elizabeth Zoltan-Ford,et al.  How to Get People to Say and Type What Computers Can Understand , 1991, Int. J. Man Mach. Stud..

[3]  Herman J. M. Steeneken Quality Evaluation of Speech Processing Systems , 1992 .

[4]  Sharon L. Oviatt,et al.  Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity , 1994, Speech Communication.

[5]  Julia Galliers,et al.  Evaluating natural language processing systems , 1995 .

[6]  Kathryn M. Dobroth,et al.  Automating Services with Speech Recognition over the Public Switched Telephone Network: Human Factors Considerations , 1991, IEEE J. Sel. Areas Commun..

[7]  K. D. Kryter Methods for the Calculation and Use of the Articulation Index , 1962 .

[8]  Louis C. W. Pols Voice quality of synthetic speech: representation and evaluation , 1994, ICSLP.

[9]  Louis C. W. Pols Multi-lingual synthesis evaluation methods , 1992, ICSLP.

[10]  John Lehrberger,et al.  Machine Translation: Linguistic characteristics of MT systems and general methodology of evaluation , 1988 .

[11]  Sharon L. Oviatt,et al.  Predicting spoken disfluencies during human-computer interaction , 1995, Comput. Speech Lang..

[12]  Herman J. M. Steeneken,et al.  A multi-language evaluation of the RASTI method for estimating speech intelligibility in auditoria , 1982 .

[13]  Patrick J. Grother,et al.  The First Census Optical Character Recognition Systems Conference | NIST , 1992 .

[14]  Louis C. W. Pols Speech technology systems: performance and evaluation , 1994 .

[15]  K. D. Kryter,et al.  ARTICULATION-TESTING METHODS: CONSONANTAL DIFFERENTIATION WITH A CLOSED-RESPONSE SET. , 1965, The Journal of the Acoustical Society of America.

[16]  Philip R. Cohen,et al.  The role of voice in human-machine communication , 1994 .

[17]  Margaret King,et al.  Using Test Suites in Evaluation of Machine Translation Systems , 1990, COLING.

[18]  Lorna Balkan,et al.  Test Suites for Natural Language Processing , 1995, TC.

[19]  Donna K. Harman,et al.  Overview of the First Text REtrieval Conference (TREC-1) , 1992, TREC.

[20]  Doug Arnold,et al.  Machine Translation: An Introductory Guide , 1994 .

[21]  Karen Spärck Jones Towards Better NLP System Evaluation , 1994, HLT.

[22]  Ute Jekosch Speech quality assessment and evaluation , 1993, EUROSPEECH.

[23]  D B Pisoni,et al.  Segmental intelligibility of synthetic speech produced by rule. , 1989, The Journal of the Acoustical Society of America.

[24]  Georges Van Slype Conception d’une méthodologie générale d’evaluation de la traduction automatique , 1982 .

[25]  Ralph Grishman,et al.  Evaluating syntax performance of parser/grammars , 1991 .

[26]  Les E. Atlas,et al.  The challenge of spoken language systems: research directions for the nineties , 1995, IEEE Trans. Speech Audio Process..

[27]  Herman J. M. Steeneken,et al.  Objective assessment of speech communication systems; introduction of a software based procedure , 1993, EUROSPEECH.

[28]  John R. Pierce,et al.  Language and Machines: Computers in Translation and Linguistics , 1966 .

[29]  Stephen V. Rice,et al.  The Third Annual Test of OCR Accuracy , 1994 .

[30]  A. Tomkins,et al.  Validation of document image defect models for optical character recognition , 1994 .

[31]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[32]  H. Wallace Sinaiko,et al.  Further experiments in language translation : readability of computer translations , 1972 .

[33]  Stephen V. Rice,et al.  An Evaluation of OCR Accuracy , 1993 .

[34]  Roland Hausser The Coordinator's Final Report on the first Morpholympics , 1994, LDV Forum.

[35]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[36]  K. Ishii,et al.  Generation of distorted characters and its applications , 1983 .

[37]  Klaus Fellbaum,et al.  An evaluation system for ascertaining the quality of synthetic speech based on subjective category rating tests , 1993, EUROSPEECH.

[38]  Murray F. Spiegel Using the ORATOR® synthesizer for a public reverse-directory service: design, lessons, and recommendations , 1993, EUROSPEECH.

[39]  John S. White,et al.  The ARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches , 1994, AMTA.

[40]  I. Yamashita,et al.  State of the art of handwritten numeral recognition in Japan-The results of the first IPTP character recognition competition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[41]  George Nagy,et al.  Performance metrics for document understanding systems , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[42]  Robert C. Moore Semantic Evaluation for Spoken-Language Systems , 1994, HLT.

[43]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[44]  Louis C. W. Pols Quality assessment of text-to-speech synthesis-by-rule , 1991 .

[45]  Judith Spitz Collection and Analysis of Data from Real Users: Implications for Speech Recognition/Understanding Systems , 1991, HLT.

[46]  Louis C. W. Pols,et al.  A structured way of looking at the performance of text-to-speech systems , 1994, SSW.

[47]  Henry S. Baird,et al.  Document image defect models , 1995 .

[48]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[49]  Robert M. Haralick,et al.  Global and local document degradation models , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).