Improving ranking performance with cost-sensitive ordinal classification via regression

This paper proposes a novel ranking approach, cost-sensitive ordinal classification via regression (COCR), which respects the discrete nature of ordinal ranks in real-world data sets. In particular, COCR applies a theoretically sound method for reducing an ordinal classification to binary and solves the binary classification sub-tasks with point-wise regression. Furthermore, COCR allows us to specify mis-ranking costs to further improve the ranking performance; this ability is exploited by deriving a corresponding cost for a popular ranking criterion, expected reciprocal rank (ERR). The resulting ERR-tuned COCR boosts the benefits of the efficiency of using point-wise regression and the accuracy of top-rank prediction from the ERR criterion. Evaluations on four large-scale benchmark data sets, i.e., "Yahoo! Learning to Rank Challenge" and "Microsoft Learning to Rank,” verify the significant superiority of COCR over commonly used regression approaches.

[1]  Qiang Wu,et al.  McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.

[2]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  Zhaohui Zheng,et al.  Learning to model relatedness for news recommendation , 2011, WWW.

[5]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[6]  Eibe Frank,et al.  A Simple Approach to Ordinal Classification , 2001, ECML.

[7]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[8]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[9]  Qiang Wu,et al.  Learning to Rank Using an Ensemble of Lambda-Gradient Models , 2010, Yahoo! Learning to Rank Challenge.

[10]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[11]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[12]  Cristina V. Lopes,et al.  Bagging gradient-boosted trees for high precision, low variance ranking models , 2011, SIGIR.

[13]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[14]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[15]  Tong Zhang,et al.  Subset Ranking Using Regression , 2006, COLT.

[16]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[17]  Kilian Q. Weinberger,et al.  Web-Search Ranking with Initialized Gradient Boosted Regression Trees , 2010, Yahoo! Learning to Rank Challenge.

[18]  Eric Brill,et al.  Beyond PageRank: machine learning for static ranking , 2006, WWW '06.

[19]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[20]  Eyke Hüllermeier,et al.  Binary Decomposition Methods for Multipartite Ranking , 2009, ECML/PKDD.

[21]  Rong Jin,et al.  Learning to Rank by Optimizing NDCG Measure , 2009, NIPS.

[22]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[23]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[24]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[25]  Ling Li,et al.  Reduction from Cost-Sensitive Ordinal Ranking to Weighted Binary Classification , 2012, Neural Computation.

[26]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[27]  Maksims Volkovs,et al.  BoltzRank: learning to maximize expected ranking gain , 2009, ICML '09.

[28]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[29]  Andrew Trotman,et al.  Sound and complete relevance assessment for XML retrieval , 2008, TOIS.

[30]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[31]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[32]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[33]  Thorsten Joachims,et al.  Online Structured Prediction via Coactive Learning , 2012, ICML.

[34]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[35]  Thomas Hofmann,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, NIPS.