A neural network approach to ordinal regression

Ordinal regression is an important type of learning, which has properties of both classification and regression. Here we describe an effective approach to adapt a traditional neural network to learn ordinal categories. Our approach is a generalization of the perceptron method for ordinal regression. On several benchmark datasets, our method (NNRank) outperforms a neural network classification method. Compared with the ordinal regression methods using Gaussian processes and support vector machines, NNRank achieves comparable performance. Moreover, NNRank has the advantages of traditional neural networks: learning in both online and batch modes, handling very large training datasets, and making rapid predictions. These features make NNRank a useful and complementary tool for large-scale data mining tasks such as information retrieval, Web page ranking, collaborative filtering, and protein ranking in bioinformatics. The neural network software is available at: http://www.cs.missouri.edu/~chengji/cheng software.html.

[1]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[2]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[3]  Thomas Hofmann,et al.  Unifying collaborative and content-based filtering , 2004, ICML.

[4]  Alessandro Sperduti,et al.  Learning Preferences for Multiclass Problems , 2004, NIPS.

[5]  Ulrich Paquet,et al.  Bayesian Hierarchical Ordinal Regression , 2005, ICANN.

[6]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[7]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[8]  Liangxiao Jiang,et al.  Augmenting naive Bayes for ranking , 2005, ICML.

[9]  Ling Li,et al.  Ordinal Regression by Extended Binary Classification , 2006, NIPS.

[10]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[11]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[12]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[13]  Dan Roth,et al.  Constraint Classification: A New Approach to Multiclass Classification , 2002, ALT.

[14]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[15]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[16]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[17]  Hanqing Lu,et al.  A practical SVM-based algorithm for ordinal regression in image retrieval , 2003, MULTIMEDIA '03.

[18]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[19]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  Wei Chu,et al.  Support Vector Ordinal Regression , 2007, Neural Computation.

[22]  Pierre Baldi,et al.  A machine learning information retrieval approach to protein fold recognition. , 2006, Bioinformatics.

[23]  Gerhard Widmer,et al.  Prediction of Ordinal Classes Using Regression Trees , 2001, Fundam. Informaticae.

[24]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[25]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[26]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[27]  Dan Roth,et al.  Learnability of Bipartite Ranking Functions , 2005, COLT.

[28]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[29]  K. Obermayer,et al.  Learning Preference Relations for Information Retrieval , 1998 .

[30]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[31]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[32]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[33]  Tom M. Mitchell,et al.  Using the Future to Sort Out the Present: Rankprop and Multitask Learning for Medical Risk Evaluation , 1995, NIPS.

[34]  Wei Chu,et al.  Preference learning with Gaussian processes , 2005, ICML.

[35]  Thomas S. Huang,et al.  Classification Approach towards Banking and Sorting Problems , 2003, ECML.

[36]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[37]  Eibe Frank,et al.  A Simple Approach to Ordinal Classification , 2001, ECML.

[38]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[39]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[40]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[41]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[42]  Hans-Peter Kriegel,et al.  Collaborative ordinal regression , 2006, ICML.

[43]  Yoram Singer,et al.  Log-Linear Models for Label Ranking , 2003, NIPS.