Learning Bregman Distance Functions for Structural Learning to Rank

We study content-based learning to rank from the perspective of learning distance functions. Standardly, the two key issues of learning to rank, feature mappings and score functions, are usually modeled separately, and the learning is usually restricted to modeling a linear distance function such as the Mahalanobis distance. However, the modeling of feature mappings and score functions are mutually interacted, and the patterns underlying the data are probably complicated and nonlinear. Thus, as a general nonlinear distance family, the Bregman distance is a suitable distance function for learning to rank, due to its strong generalization ability for distance functions, and its nonlinearity for exploring the general patterns of data distributions. In this paper, we study learning to rank as a structural learning problem, and devise a Bregman distance function to build the ranking model based on structural SVM. To improve the model robustness to outliers, we develop a robust structural learning framework for the ranking model. The proposed model Robust Structural Bregman distance functions Learning to Rank (RSBLR) is a general and unified framework for learning distance functions to rank. The experiments of data ranking on real-world datasets show the superiority of this method to the state-of-the-art literature, as well as its robustness to the noisily labeled outliers.

[1]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[2]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[3]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[4]  Gang Hua,et al.  Unsupervised One-Class Learning for Automatic Outlier Removal , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Takafumi Kanamori,et al.  Conjugate relation between loss functions and uncertainty sets in classification problems , 2013, J. Mach. Learn. Res..

[6]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[7]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[9]  Le Gruenwald,et al.  Using Data Mining to Estimate Missing Sensor Data , 2007 .

[10]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[11]  Yueting Zhuang,et al.  A low rank structural large margin method for cross-modal ranking , 2013, SIGIR.

[12]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[13]  Nicu Sebe,et al.  Toward Robust Distance Metric Analysis for Similarity Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[15]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[16]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[17]  Hongyuan Zha,et al.  A General Boosting Method and its Application to Learning Ranking Functions for Web Search , 2007, NIPS.

[18]  Melvyn Sim,et al.  The Coherent Loss Function for Classification , 2014, ICML.

[19]  Shie Mannor,et al.  Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..

[20]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[21]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[22]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[23]  Rong Jin,et al.  Rank-based distance metric learning: An application to image retrieval , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[25]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[26]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[27]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[28]  Dacheng Tao,et al.  Multi-View Intact Space Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yi Liu,et al.  An Efficient Algorithm for Local Distance Metric Learning , 2006, AAAI.

[30]  Thomas R. Ioerger,et al.  Distance Metric Learning through Optimization of Ranking , 2007 .

[31]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[32]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[33]  Tong Zhang,et al.  Subset Ranking Using Regression , 2006, COLT.

[34]  Hang Li Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[35]  Chiranjib Bhattacharyya,et al.  Structured learning for non-smooth ranking losses , 2008, KDD.

[36]  Nenghai Yu,et al.  Learning Bregman Distance Functions for Semi-Supervised Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[37]  Frédéric Jurie,et al.  PCCA: A new approach for distance learning from sparse pairwise constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[39]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[40]  Martha White,et al.  Relaxed Clipping: A Global Training Method for Robust Regression and Classification , 2010, NIPS.

[41]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[42]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[43]  Zhongfei Zhang,et al.  Structural Bregman Distance Functions Learning to Rank with Self-Reinforcement , 2014, 2014 IEEE International Conference on Data Mining.

[44]  Harikrishna Narasimhan,et al.  A Structural SVM Based Approach for Optimizing Partial AUC , 2013, ICML.

[45]  Dacheng Tao,et al.  Large-Margin Multi-ViewInformation Bottleneck , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Rong Jin,et al.  Bayesian Active Distance Metric Learning , 2007, UAI.

[47]  Anders P. Eriksson,et al.  An Adversarial Optimization Approach to Efficient Outlier Removal , 2011, Journal of Mathematical Imaging and Vision.

[48]  Cheng Wu,et al.  Robust Support Vector Regression for Uncertain Input and Output Data , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[49]  Koby Crammer,et al.  Robust Support Vector Machine Training via Convex Outlier Ablation , 2006, AAAI.

[50]  Daniel Lowd,et al.  On Robustness and Regularization of Structural Support Vector Machines , 2014, ICML.

[51]  Ming Yang,et al.  Multi-view learning from imperfect tagging , 2012, ACM Multimedia.

[52]  Qiang Wu,et al.  McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.

[53]  Nenghai Yu,et al.  Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering , 2009, NIPS.

[54]  Liva Ralaivola,et al.  Learning SVMs from Sloppily Labeled Data , 2009, ICANN.