Learning-to-Rank with Partitioned Preference: Fast Estimation for the Plackett-Luce Model

We consider the problem of listwise learning-to-rank (LTR) on data with \textit{partitioned preference}, where a set of items are sliced into ordered and disjoint partitions, but the ranking of items within a partition is unknown. The Plackett-Luce (PL) model has been widely used in listwise LTR methods. However, given $N$ items with $M$ partitions, calculating the likelihood of data with partitioned preference under the PL model has a time complexity of $O(N+S!)$, where $S$ is the maximum size of the top $M-1$ partitions. This computational challenge restrains existing PL-based listwise LTR methods to only a special case of partitioned preference, \textit{top-$K$ ranking}, where the exact order of the top $K$ items is known. In this paper, we exploit a random utility model formulation of the PL model and propose an efficient approach through numerical integration for calculating the likelihood. This numerical approach reduces the aforementioned time complexity to $O(N+MS)$, which allows training deep-neural-network-based ranking models with a large output space. We demonstrate that the proposed method outperforms well-known LTR baselines and remains scalable through both simulation experiments and applications to real-world eXtreme Multi-Label (XML) classification tasks. The proposed method also achieves state-of-the-art performance on XML datasets with relatively large numbers of labels per sample.

[1]  Craig Boutilier,et al.  Effective sampling and learning for mallows models with pairwise-preference data , 2014, J. Mach. Learn. Res..

[2]  Fredric C. Gey,et al.  Inferring probability of relevance using the method of logistic regression , 1994, SIGIR '94.

[3]  D. Hunter MM algorithms for generalized Bradley-Terry models , 2003 .

[4]  J. Yellott The relationship between Luce's Choice Axiom, Thurstone's Theory of Comparative Judgment, and the double exponential distribution , 1977 .

[5]  Yi Mao,et al.  Non-parametric Modeling of Partially Ranked Data , 2007, NIPS.

[6]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[7]  Lirong Xia,et al.  Learning Mixtures of Plackett-Luce Models with Features from Top-$l$ Orders , 2020, ArXiv.

[8]  Inderjit S. Dhillon,et al.  Large-scale Multi-label Learning with Missing Labels , 2013, ICML.

[9]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[10]  Dirk Schäfer Dyad ranking with generalized Plackett-Luce models , 2018 .

[11]  Cheng Li,et al.  The LambdaLoss Framework for Ranking Metric Optimization , 2018, CIKM.

[12]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[13]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[14]  Hideitsu Hino,et al.  A Grouped Ranking Model for Item Preference Parameter , 2010, Neural Computation.

[15]  Manik Varma,et al.  FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[16]  Max Welling,et al.  Estimating Gradients for Discrete Random Variables by Sampling without Replacement , 2020, ICLR.

[17]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[18]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[19]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[20]  Matthias Grossglauser,et al.  Fast and Accurate Inference of Plackett-Luce Models , 2015, NIPS.

[21]  Ali Mousavi,et al.  Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces , 2019, NeurIPS.

[22]  Manik Varma,et al.  Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[23]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[24]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[25]  Stratis Ioannidis,et al.  Fast and Accurate Ranking Regression , 2020, AISTATS.

[26]  W. Bruce Croft,et al.  Learning a Deep Listwise Context Model for Ranking Refinement , 2018, SIGIR.

[27]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[28]  R. Plackett The Analysis of Permutations , 1975 .

[29]  Johan Ugander,et al.  Fundamental Limits of Testing the Independence of Irrelevant Alternatives in Discrete Choice , 2019, EC.

[30]  R. Luce,et al.  Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[31]  Lirong Xia,et al.  Learning Plackett-Luce Mixtures from Partial Preferences , 2019, AAAI.

[32]  Daniel McFadden,et al.  Modelling the Choice of Residential Location , 1977 .

[33]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[34]  Konstantina Christakopoulou,et al.  Collaborative Ranking with a Push at the Top , 2015, WWW.

[35]  Ajith Ramanathan,et al.  Practical Diversified Recommendations on YouTube with Determinantal Point Processes , 2018, CIKM.

[36]  Ashish Khetan,et al.  Generalized Rank-Breaking: Computational and Statistical Tradeoffs , 2018, J. Mach. Learn. Res..