Gaussian Processes (GPs) are a powerful modelling framework incorporating kernels and Bayesian inference, and are recognised as stateof-the-art for many machine learning tasks. Despite this, GPs have seen few applications in natural language processing (notwithstanding several recent papers by the authors). We argue that the GP framework offers many benefits over commonly used machine learning frameworks, such as linear models (logistic regression, least squares regression) and support vector machines. Moreover, GPs are extremely flexible and can be incorporated into larger graphical models, forming an important additional tool for probabilistic inference. Notably, GPs are one of the few models which support analytic Bayesian inference, avoiding the many approximation errors that plague approximate inference techniques in common use for Bayesian models (e.g. MCMC, variational Bayes).1 GPs accurately model not just the underlying task, but also the uncertainty in the predictions, such that uncertainty can be propagated through pipelines of probabilistic components. Overall, GPs provide an elegant, flexible and simple means of probabilistic inference and are well overdue for consideration of the NLP community. This tutorial will focus primarily on regression and classification, both fundamental techniques of wide-spread use in the NLP community. Within NLP, linear models are near ubiquitous, because they provide good results for many tasks, support efficient inference (including dynamic programming in structured prediction) and support simple parameter interpretation. However, linear models are inherently limited in the types of relationships between variables they can model. Often
[1]
Louisa Sadler,et al.
Structural Non-Correspondence in Translation
,
1991,
EACL.
[2]
Lucia Specia,et al.
An Investigation on the Effectiveness of Features for Translation Quality Estimation
,
2013,
MTSUMMIT.
[3]
Lucia Specia,et al.
QuEst - A translation quality estimation framework
,
2013,
ACL.
[4]
Lucia Specia,et al.
Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation
,
2013,
ACL.
[5]
Neil D. Lawrence,et al.
Kernels for Vector-Valued Functions: a Review
,
2011,
Found. Trends Mach. Learn..
[6]
Neil D. Lawrence,et al.
Gaussian Processes for Big Data
,
2013,
UAI.
[7]
Trevor Cohn,et al.
A temporal model of text periodicities using Gaussian Processes
,
2013,
EMNLP.
[8]
Trevor Cohn,et al.
Predicting and Characterising User Impact on Twitter
,
2014,
EACL.
[9]
Carl E. Rasmussen,et al.
A Unifying View of Sparse Approximate Gaussian Process Regression
,
2005,
J. Mach. Learn. Res..
[10]
Neil D. Lawrence,et al.
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data
,
2003,
NIPS.
[11]
Simon Rogers,et al.
Semi-parametric analysis of multi-rater data
,
2010,
Stat. Comput..
[12]
Edwin V. Bonilla,et al.
Multi-task Gaussian Process Prediction
,
2007,
NIPS.
[13]
Hal Daumé,et al.
Frustratingly Easy Domain Adaptation
,
2007,
ACL.
[14]
Tom Heskes,et al.
Learning from Multiple Annotators with Gaussian Processes
,
2011,
ICANN.