Toward a Scoring Function for Quality-Driven Machine Translation

We describe how we constructed an automatic scoring function for machine translation quality; this function makes use of arbitrarily many pieces of natural language processing software that has been designed to process English language text. By machine-learning values of functions available inside the software and by constructing functions that yield values based upon the software output, we are able to achieve preliminary, positive results in machine-learning the difference between human-produced English and machine-translation English. We suggest how the scoring function may be used for MT system development.

[1]  John M. Chambers,et al.  Graphical Methods for Data Analysis , 1983 .

[2]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[3]  B. Everitt,et al.  Applied Multivariate Data Analysis. , 1993 .

[4]  Douglas A. Jones,et al.  Twisted pair grammar: support for rapid development of machine translation for low density languages , 1998, AMTA.

[5]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[6]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[7]  Eric R. Ziegel,et al.  Applied Multivariate Data Analysis , 2002, Technometrics.

[8]  John S. White,et al.  The ARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches , 1994, AMTA.

[9]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[10]  J. Palous,et al.  Machine Learning and Data Mining , 2002 .

[11]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[12]  Tom M. Mitchell,et al.  Machine Learning and Data Mining , 2012 .

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[14]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[15]  Thomas Lukasiewicz MAXIMUM ENTROPY , 2000 .

[16]  V. Raskin,et al.  Universal Grammar and Lexis for Quick Ramp-Up of MT Systems , COLING.

[17]  Matthew Haines,et al.  Integrating Knowledge Bases and Statistics in MT , 1994, AMTA.

[18]  V. Clark,et al.  Computer-aided multivariate analysis , 1991 .

[19]  Ivan Bratko,et al.  Machine Learning and Data Mining; Methods and Applications , 1998 .

[20]  武信 由太郎,et al.  Kenkyusha's new Japanese-English dictionary , 1931 .

[21]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[22]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[23]  K. Masuda,et al.  Kenkyusha's new japanese-english dictionary , 1974 .

[24]  Beat Kleiner,et al.  Graphical Methods for Data Analysis , 1983 .

[25]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[26]  Sergei Nirenburg,et al.  Universal Grammar and Lexis for Quick Ramp-Up of MT Systems , 1998, ACL.

[27]  Roger K. Moore Computer Speech and Language , 1986 .