SVM selective sampling for ranking with application to data retrieval

Learning ranking (or preference) functions has been a major issue in the machine learning community and has produced many applications in information retrieval. SVMs (Support Vector Machines) - a classification and regression methodology - have also shown excellent performance in learning ranking functions. They effectively learn ranking functions of high generalization based on the "large-margin" principle and also systematically support nonlinear ranking by the "kernel trick". In this paper, we propose an SVM selective sampling technique for learning ranking functions. SVM selective sampling (or active learning with SVM) has been studied in the context of classification. Such techniques reduce the labeling effort in learning classification functions by selecting only the most informative samples to be labeled. However, they are not extendable to learning ranking functions, as the labeled data in ranking is relative ordering, or partial orders of data. Our proposed sampling technique effectively learns an accurate SVM ranking function with fewer partial orders. We apply our sampling technique to the data retrieval application, which enables fuzzy search on relational databases by interacting with users for learning their preferences. Experimental results show a significant reduction of the labeling effort in inducing accurate ranking functions.

[1]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  Eyke Hüllermeier,et al.  Pairwise Preference Learning and Ranking , 2003, ECML.

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[6]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  Seung-won Hwang,et al.  RankFP: a framework for supporting rank formulation and processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[9]  Dan Roth,et al.  Constraint Classification: A New Approach to Multiclass Classification , 2002, ALT.

[10]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[11]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[12]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[13]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[14]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[15]  Klaus Brinker,et al.  Active learning of label ranking functions , 2004, ICML.

[16]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.