Kernelising the Proportional Odds Model through kernel learning techniques

The classification of patterns into naturally ordered labels is referred to as ordinal regression, which is a very common setting for real world applications. One of the most widely used ordinal regression algorithms is the Proportional Odds Model (POM), despite the linearity of the resultant decision boundaries. Through different proposals, this paper explores the notions of kernel trick and empirical feature space to reformulate the POM method and obtain nonlinear decision boundaries. Moreover, a new technique for aligning the kernel matrix taking into account the ordinal problem information is proposed, as well as a regularised gradient ascent methodology which is used to select the optimal dimensionality for the empirical feature space. The capability of the different developed methodologies is evaluated by the use of a nonlinearly separable toy dataset and an extensive set of experiments over 28 ordinal datasets. The results indicate that the tested methodologies are competitive with respect to other state-of-the-art algorithms, and they significantly improve the original POM algorithm.

[1]  Andrea Esuli,et al.  Evaluation Measures for Ordinal Regression , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[2]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[3]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[4]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[5]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[6]  Qinghua Zheng,et al.  Ordinal extreme learning machine , 2010, Neurocomputing.

[7]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[8]  Christian Igel,et al.  Empirical evaluation of the improved Rprop learning algorithms , 2003, Neurocomputing.

[9]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[10]  Bernhard Schölkopf,et al.  A Kernel for Protein Secondary Structure Prediction , 2004 .

[11]  María Pérez-Ortiz,et al.  An Experimental Study of Different Ordinal Regression Methods and Measures , 2012, HAIS.

[12]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[13]  Tobias Glasmachers,et al.  Gradient based optimization of support vector machines , 2008 .

[14]  Gaël Richard,et al.  Multiclass Feature Selection With Kernel Gram-Matrix-Based Criteria , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Jaime S. Cardoso,et al.  Learning to Classify Ordinal Data: The Data Replication Method , 2007, J. Mach. Learn. Res..

[16]  Huilin Xiong,et al.  A Unified Framework for Kernelization: The Empirical Kernel Feature Space , 2009, 2009 Chinese Conference on Pattern Recognition.

[17]  Wei Chu,et al.  Support Vector Ordinal Regression , 2007, Neural Computation.

[18]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[19]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[20]  M. Omair Ahmad,et al.  Optimizing the kernel in the empirical feature space , 2005, IEEE Transactions on Neural Networks.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Pedro Antonio Gutiérrez,et al.  Metrics to guide a multi-objective evolutionary algorithm for ordinal classification , 2014, Neurocomputing.

[23]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[24]  Xiaoming Zhang,et al.  Kernel Discriminant Learning for Ordinal Regression , 2010, IEEE Transactions on Knowledge and Data Engineering.

[25]  Bernard De Baets,et al.  Learning partial ordinal class memberships with kernel-based proportional odds models , 2012, Comput. Stat. Data Anal..

[26]  Mehryar Mohri,et al.  Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..

[27]  Shigeo Abe,et al.  Sparse Least Squares Support Vector Regressors Trained in the Reduced Empirical Feature Space , 2007, ICANN.

[28]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.