Gaussian Processes for Ordinal Regression

We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and real-world data sets, including applications of ordinal regression to collaborative filtering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach.

[1]  E. Lieb,et al.  On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation , 1976 .

[2]  A. O'Hagan,et al.  Curve Fitting and Optimal Design for Prediction , 1978 .

[3]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[4]  J. Pratt Concavity of the Log Likelihood , 1981 .

[5]  G. Wahba Spline models for observational data , 1990 .

[6]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[7]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[8]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[9]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[10]  David J. C. MacKay,et al.  Bayesian Methods for Backpropagation Networks , 1996 .

[11]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[12]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[13]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Ole Winther,et al.  Efficient Approaches to Gaussian Process Classification , 1999, NIPS.

[15]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[18]  Gerhard Widmer,et al.  Prediction of Ordinal Classes Using Regression Trees , 2001, Fundam. Informaticae.

[19]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[20]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[21]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[22]  Eibe Frank,et al.  A Simple Approach to Ordinal Classification , 2001, ECML.

[23]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[24]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[25]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[26]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[27]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[28]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[29]  Eric R. Ziegel,et al.  Multivariate Statistical Modelling Based on Generalized Linear Models , 2002, Technometrics.

[30]  Dan Roth,et al.  Constraint Classification: A New Approach to Multiclass Classification , 2002, ALT.

[31]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[32]  Carl E. Rasmussen,et al.  Warped Gaussian Processes , 2003, NIPS.

[33]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[34]  Zoubin Ghahramani,et al.  The EM-EP algorithm for Gaussian process classification , 2003 .

[35]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[36]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[37]  G. Tutz Generalized Semiparametrically Structured Ordinal Models , 2003, Biometrics.

[38]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[39]  Thomas Hofmann,et al.  Unifying collaborative and content-based filtering , 2004, ICML.

[40]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.