Ranking as Function Approximation

An overview of the problem of learning to rank data is given. Some current machine learning approaches to the problem are described. The cost functions used to assess the quality of a ranking algorithm present particular difficulties: they are non-differentiable (as a function of the scores output by the ranker) and multivariate (in the sense that the cost associated with one ranked object depends on its relations to several other ranked objects). I present some ideas on a general framework for training using such cost functions; the approach has an appealing physical interpretation. The paper is tutorial in the sense that it is not assumed that the reader is familiar with the methods of machine learning; my hope is that the paper will encourage applied mathematicians to explore this topic.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[3]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[4]  H. A. David,et al.  The method of paired comparisons , 1966 .

[5]  Philippe Refregier,et al.  PROBABILISTIC APPROACH FOR MULTICLASS CLASSIFICATION WITH NEURAL NETWORKS , 1991 .

[6]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[7]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[8]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[9]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[10]  Edward F. Harrington,et al.  Online Ranking/Collaborative Filtering Using the Perceptron Algorithm , 2003, ICML.

[11]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[12]  D. Sattinger,et al.  Calculus on Manifolds , 1986 .

[13]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[14]  Bhavani Raskutti,et al.  Optimising area under the ROC curve using gradient descent , 2004, ICML.

[15]  J. Munkres,et al.  Calculus on Manifolds , 1965 .

[16]  Mehryar Mohri,et al.  Confidence Intervals for the Area Under the ROC Curve , 2004, NIPS.

[17]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[18]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[19]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[20]  Yoram Singer,et al.  Log-Linear Models for Label Ranking , 2003, NIPS.

[21]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[22]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[23]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[24]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[25]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[26]  R. A. Bradley,et al.  Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , 1952 .