论文信息 - Learning-to-Rank with Partitioned Preference: Fast Estimation for the Plackett-Luce Model - 字舞流文

Learning-to-Rank with Partitioned Preference: Fast Estimation for the Plackett-Luce Model

We consider the problem of listwise learning-to-rank (LTR) on data with \textit{partitioned preference}, where a set of items are sliced into ordered and disjoint partitions, but the ranking of items within a partition is unknown. The Plackett-Luce (PL) model has been widely used in listwise LTR methods. However, given $N$ items with $M$ partitions, calculating the likelihood of data with partitioned preference under the PL model has a time complexity of $O(N+S!)$, where $S$ is the maximum size of the top $M-1$ partitions. This computational challenge restrains existing PL-based listwise LTR methods to only a special case of partitioned preference, \textit{top-$K$ ranking}, where the exact order of the top $K$ items is known. In this paper, we exploit a random utility model formulation of the PL model and propose an efficient approach through numerical integration for calculating the likelihood. This numerical approach reduces the aforementioned time complexity to $O(N+MS)$, which allows training deep-neural-network-based ranking models with a large output space. We demonstrate that the proposed method outperforms well-known LTR baselines and remains scalable through both simulation experiments and applications to real-world eXtreme Multi-Label (XML) classification tasks. The proposed method also achieves state-of-the-art performance on XML datasets with relatively large numbers of labels per sample.

Qiaozhu Mei | Weijing Tang | Ed H. Chi | Xinyang Yi | Jiaqi Ma | Zhe Zhao | Lichan Hong | Lichan Hong | Xinyang Yi | Q. Mei | Zhe Zhao | Weijing Tang | Jiaqi Ma

[1] Craig Boutilier,et al. Effective sampling and learning for mallows models with pairwise-preference data , 2014, J. Mach. Learn. Res..

[2] Fredric C. Gey,et al. Inferring probability of relevance using the method of logistic regression , 1994, SIGIR '94.

[3] D. Hunter. MM algorithms for generalized Bradley-Terry models , 2003 .

[4] J. Yellott. The relationship between Luce's Choice Axiom, Thurstone's Theory of Comparative Judgment, and the double exponential distribution , 1977 .

[5] Yi Mao,et al. Non-parametric Modeling of Partially Ranked Data , 2007, NIPS.

[6] Tie-Yan Liu,et al. Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[7] Lirong Xia,et al. Learning Mixtures of Plackett-Luce Models with Features from Top-$l$ Orders , 2020, ArXiv.

[8] Inderjit S. Dhillon,et al. Large-scale Multi-label Learning with Missing Labels , 2013, ICML.

[9] R. Duncan Luce,et al. Individual Choice Behavior: A Theoretical Analysis , 1979 .

[10] Dirk Schäfer. Dyad ranking with generalized Plackett-Luce models , 2018 .

[11] Cheng Li,et al. The LambdaLoss Framework for Ranking Metric Optimization , 2018, CIKM.

[12] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.

[13] Prateek Jain,et al. Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[14] Hideitsu Hino,et al. A Grouped Ranking Model for Item Preference Parameter , 2010, Neural Computation.

[15] Manik Varma,et al. FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[16] Max Welling,et al. Estimating Gradients for Discrete Random Variables by Sampling without Replacement , 2020, ICLR.

[17] Tie-Yan Liu,et al. Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[18] Stephen E. Robertson,et al. SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[19] Tie-Yan Liu,et al. Learning to rank for information retrieval , 2009, SIGIR.

[20] Matthias Grossglauser,et al. Fast and Accurate Inference of Plackett-Luce Models , 2015, NIPS.

[21] Ali Mousavi,et al. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces , 2019, NeurIPS.

[22] Manik Varma,et al. Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[23] C. L. Mallows. NON-NULL RANKING MODELS. I , 1957 .

[24] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.

[25] Stratis Ioannidis,et al. Fast and Accurate Ranking Regression , 2020, AISTATS.

[26] W. Bruce Croft,et al. Learning a Deep Listwise Context Model for Ranking Refinement , 2018, SIGIR.

[27] Tie-Yan Liu,et al. Learning to Rank for Information Retrieval , 2011 .

[28] R. Plackett. The Analysis of Permutations , 1975 .

[29] Johan Ugander,et al. Fundamental Limits of Testing the Independence of Irrelevant Alternatives in Discrete Choice , 2019, EC.

[30] R. Luce,et al. Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[31] Lirong Xia,et al. Learning Plackett-Luce Mixtures from Partial Preferences , 2019, AAAI.

[32] Daniel McFadden,et al. Modelling the Choice of Residential Location , 1977 .

[33] Manik Varma,et al. Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[34] Konstantina Christakopoulou,et al. Collaborative Ranking with a Push at the Top , 2015, WWW.

[35] Ajith Ramanathan,et al. Practical Diversified Recommendations on YouTube with Determinantal Point Processes , 2018, CIKM.

[36] Ashish Khetan,et al. Generalized Rank-Breaking: Computational and Statistical Tradeoffs , 2018, J. Mach. Learn. Res..