Data Selection Techniques for Large-Scale Rank SVM

Learning to rank has become a popular research topic in several areas such as information retrieval and machine learning. Pair-wise ranking, which learns all the order preferences between pairs of examples, is a typical method for solving the ranking problem. In pair-wise ranking, Rank SVM is a widely-used algorithm and has been successfully applied to the ranking problem in the previous work. However, Rank SVM suffers from the critical problem of long training time needed to deal with a huge number of pairs. In this paper, we propose a data selection technique, Pruned Rank SVM, that selects the most informative pairs before training. Experimental results show that the performance of Pruned Rank SVM is on par with Rank SVM while using significantly fewer pairs.

[1]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[2]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[3]  Tong Zhang,et al.  Subset Ranking Using Regression , 2006, COLT.

[4]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[5]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[6]  Peter Brusilovsky,et al.  Collaborative filtering for social tagging systems: an experiment with CiteULike , 2009, RecSys '09.

[7]  Dawit Yimam,et al.  Expert Finding Systems for Organizations: Domain Analysis and The DEMOIR Approach , 1999 .

[8]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[9]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[10]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[13]  Tian Weixin,et al.  Learning to Rank Using Semantic Features in Document Retrieval , 2009, 2009 WRI Global Congress on Intelligent Systems.

[14]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .