Training linear ranking SVMs in linearithmic time using red-black trees

We introduce an efficient method for training the linear ranking support vector machine. The method combines cutting plane optimization with red-black tree based approach to subgradient calculations, and has O(ms+mlog (m)) time complexity, where m is the number of training examples, and s the average number of non-zero features per example. Best previously known training algorithms achieve the same efficiency only for restricted special cases, whereas the proposed approach allows any real valued utility scores in the training data. Experiments demonstrate the superior scalability of the proposed approach, when compared to the fastest existing RankSVM implementations.

[1]  Tapio Pahikkala,et al.  An efficient algorithm for learning to rank from preference graphs , 2009, Machine Learning.

[2]  S. Sathiya Keerthi,et al.  Efficient algorithms for ranking with SVMs , 2010, Information Retrieval.

[3]  Rudolf Bayer,et al.  Symmetric binary B-Trees: Data structure and maintenance algorithms , 1972, Acta Informatica.

[4]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[5]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[6]  Sören Sonnenburg,et al.  Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization , 2009, J. Mach. Learn. Res..

[7]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[8]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[9]  Bernard De Baets,et al.  ROC analysis in ordinal regression learning , 2008, Pattern Recognit. Lett..

[10]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[11]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[12]  Eyke Hüllermeier,et al.  Preference Learning , 2005, Künstliche Intell..

[13]  Eyke Hllermeier,et al.  Preference Learning , 2010 .

[14]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[15]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[16]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[19]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[20]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[21]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[22]  Thorsten Joachims,et al.  Sparse kernel SVMs via cutting-plane training , 2009, Machine-mediated learning.

[23]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[24]  T. Salakoski,et al.  Learning to Rank with Pairwise Regularized Least-Squares , 2007 .

[25]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[26]  J. Weston,et al.  Support Vector Machine Solvers , 2007 .

[27]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[28]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[29]  Alexander J. Smola,et al.  A scalable modular convex solver for regularized risk minimization , 2007, KDD '07.

[30]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[31]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[32]  Alexander J. Smola,et al.  Bundle Methods for Machine Learning , 2007, NIPS.

[33]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.