Incremental Sparse Bayesian Ordinal Regression

Ordinal Regression (OR) aims to model the ordering information between different data categories, which is a crucial topic in multi-label learning. An important class of approaches to OR models the problem as a linear combination of basis functions that map features to a high-dimensional non-linear space. However, most of the basis function-based algorithms are time consuming. We propose an incremental sparse Bayesian approach to OR tasks and introduce an algorithm to sequentially learn the relevant basis functions in the ordinal scenario. Our method, called Incremental Sparse Bayesian Ordinal Regression (ISBOR), automatically optimizes the hyper-parameters via the type-II maximum likelihood method. By exploiting fast marginal likelihood optimization, ISBOR can avoid big matrix inverses, which is the main bottleneck in applying basis function-based algorithms to OR tasks on large-scale datasets. We show that ISBOR can make accurate predictions with parsimonious basis functions while offering automatic estimates of the prediction uncertainty. Extensive experiments on synthetic and real word datasets demonstrate the efficiency and effectiveness of ISBOR compared to other basis function-based OR approaches.

[1]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[2]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[3]  P. K. Srijith,et al.  Semi-supervised Gaussian Process Ordinal Regression , 2013, ECML/PKDD.

[4]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[5]  Fengzhen Tang,et al.  Ordinal regression based on learning vector quantization , 2017, Neural Networks.

[6]  Pedro Antonio Gutiérrez,et al.  Ordinal regression neural networks based on concentric hyperspheres , 2014, Neural Networks.

[7]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[8]  Bin Gu,et al.  Accurate on-line v-support vector learning , 2012, Neural Networks.

[9]  Wei Chu,et al.  Preference learning with Gaussian processes , 2005, ICML.

[10]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[11]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[12]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[13]  Zhifeng Hao,et al.  Multiple-Instance Ordinal Regression , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[15]  María Pérez-Ortiz,et al.  Semi-supervised learning for ordinal Kernel Discriminant Analysis , 2016, Neural Networks.

[16]  Jun Hu,et al.  Collaborative Filtering via Additive Ordinal Regression , 2018, WSDM.

[17]  Pedro Antonio Gutiérrez,et al.  Ordinal Regression Methods: Survey and Experimental Study , 2016, IEEE Transactions on Knowledge and Data Engineering.

[18]  Xiaoming Zhang,et al.  Kernel Discriminant Learning for Ordinal Regression , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[20]  P. K. Srijith,et al.  A Probabilistic Least Squares Approach to Ordinal Regression , 2012, Australasian Conference on Artificial Intelligence.

[21]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[22]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[23]  Jaime S. Cardoso,et al.  The unimodal model for the classification of ordinal data , 2008, Neural Networks.

[24]  Huanhuan Chen,et al.  Sparse Bayesian approach for feature selection , 2014, 2014 IEEE Symposium on Computational Intelligence in Big Data (CIBD).

[25]  Qinghua Zheng,et al.  Ordinal Regression with Sparse Bayesian , 2009, ICIC.

[26]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[27]  Tjalling J. Ypma,et al.  Historical Development of the Newton-Raphson Method , 1995, SIAM Rev..

[28]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[29]  Bin Gu,et al.  Incremental Support Vector Learning for Ordinal Regression , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[30]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[31]  Gang Hua,et al.  Ordinal Regression with Multiple Output CNN for Age Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  M. de Rijke,et al.  Probabilistic Feature Selection and Classification Vector Machine , 2016, ACM Trans. Knowl. Discov. Data.

[33]  Wei Chu,et al.  Support Vector Ordinal Regression , 2007, Neural Computation.

[34]  Zhifeng Hao,et al.  A Maximum Margin Approach for Semisupervised Ordinal Regression Clustering , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[36]  P. K. Srijith,et al.  Validation Based Sparse Gaussian Processes for Ordinal Regression , 2012, ICONIP.