Evolving Gaussian Process Kernels for Translation Editing Effort Estimation

In many Natural Language Processing problems the combination of machine learning and optimization techniques is essential. One of these problems is estimating the effort required to improve, under direct human supervision, a text that has been translated using a machine translation method. Recent developments in this area have shown that Gaussian Processes can be accurate for post-editing effort prediction. However, the Gaussian Process kernel has to be chosen in advance, and this choice influences the quality of the prediction. In this paper, we propose a Genetic Programming algorithm to evolve kernels for Gaussian Processes. We show that the combination of evolutionary optimization and Gaussian Processes removes the need for a-priori specification of the kernel choice, and achieves predictions that, in many cases, outperform those obtained with fixed kernels.

[1]  David Duvenaud,et al.  Automatic model construction with Gaussian processes , 2014 .

[2]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Neil D. Lawrence,et al.  Gaussian Processes for Natural Language Processing , 2014, ACL.

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Philipp Koehn,et al.  Proceedings of the Sixth Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[8]  David J. Montana,et al.  Strongly Typed Genetic Programming , 1995, Evolutionary Computation.

[9]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[10]  Simon Rogers,et al.  Protein interaction detection in sentences via Gaussian Processes: a preliminary evaluation , 2011, Int. J. Data Min. Bioinform..

[11]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[12]  Gabriel Kronberger,et al.  Evolution of Covariance Functions for Gaussian Process Regression Using Genetic Programming , 2013, EUROCAST.

[13]  Ingemar J. Cox,et al.  Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance , 2017, WWW.

[14]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[15]  William Cohen Machine Learning for Information Management: Some Promising Directions , 2007, ICMLA 2007.

[16]  Thomas Hofmann,et al.  Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification , 2017, WWW.

[17]  Sean Luke,et al.  Evolving kernels for support vector machine classification , 2007, GECCO '07.

[18]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[19]  Lucia Specia,et al.  Learning Structural Kernels for Natural Language Processing , 2015, TACL.

[20]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[21]  Walter Daelemans,et al.  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2014, EMNLP 2014.

[22]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[23]  Lucia Specia,et al.  Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation , 2013, ACL.

[24]  Trevor Cohn,et al.  A temporal model of text periodicities using Gaussian Processes , 2013, EMNLP.

[25]  J. Hintze,et al.  Violin plots : A box plot-density trace synergism , 1998 .

[26]  Lucia Specia,et al.  QuEst - A translation quality estimation framework , 2013, ACL.

[27]  Laura Diosan,et al.  Evolving kernel functions for SVMs by genetic programming , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[28]  Bernd Bischl,et al.  Tuning and evolution of support vector kernels , 2012, Evol. Intell..

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Michael G. Madden,et al.  An Evolutionary Approach to Automatic Kernel Construction , 2006, ICANN.

[31]  Daniel Beck,et al.  Gaussian Processes for Text Regression , 2017 .

[32]  Roberto Santana,et al.  Reproducing and learning new algebraic operations on word embeddings using genetic programming , 2017, ArXiv.

[33]  Laura Diosan,et al.  Improving classification performance of Support Vector Machine by genetically optimising kernel shape and hyper-parameters , 2010, Applied Intelligence.

[34]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[35]  Lucia Specia,et al.  Exploiting Objective Annotations for Measuring Translation Post-editing Effort , 2011 .

[36]  Lucia Specia,et al.  An Investigation on the Effectiveness of Features for Translation Quality Estimation , 2013, MTSUMMIT.

[37]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[38]  Daniel Beck Modelling Representation Noise in Emotion Analysis using Gaussian Processes , 2017, IJCNLP.

[39]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[40]  Joshua B. Tenenbaum,et al.  Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[41]  Wu Bing,et al.  A GP-based kernel construction and optimization method for RVM , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).