Beyond the Ground-Truth: An Evaluator-Generator Framework for Group-wise Learning-to-Rank in E-Commerce

Learning-to-rank (LTR) has become a key technology in E-commerce applications. Previous LTR approaches followed the supervised learning paradigm so that learned models should match the labeled data point-wisely or pair-wisely. However, we have noticed that global context information, including the total order of items in the displayed webpage, can play an important role in interactions with the customers. Therefore, to approach the best global ordering, the exploration in a large combinatorial space of items is necessary, which requires evaluating orders that may not appear in the labeled data. In this scenario, we first show that the classical data-based metrics can be inconsistent with online performance, or even misleading. We then propose to learn an evaluator and search the best model guided by the evaluator, which forms the evaluator-generator framework for training the group-wise LTR model. The evaluator is learned from the labeled data, and is enhanced by incorporating the order context information. The generator is trained with the supervision of the evaluator by reinforcement learning to generate the best order in the combinatorial space. Our experiments in one of the world’s largest retail platforms disclose that the learned evaluator is a much better indicator than classical data-based metrics. Moreover, our LTR model achieves a significant improvement (>2%) from the current industrial-level pair-wise models in terms of both Conversion Rate (CR) and Gross Merchandise Volume (GMV) in online A/B tests.

[1]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[2]  Ed H. Chi,et al.  Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.

[3]  Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval , 2019, ICTIR.

[4]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[5]  Tong Zhang,et al.  Statistical Analysis of Bayes Optimal Subset Ranking , 2008, IEEE Transactions on Information Theory.

[6]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[7]  Craig Boutilier,et al.  SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets , 2019, IJCAI.

[8]  L. Goddard Information Theory , 1962, Nature.

[9]  Qiang Wu,et al.  McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval , 2018, SIGIR.

[12]  Yang Yu,et al.  Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning , 2018, AAAI.

[13]  Wenwu Ou,et al.  Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search , 2018, IJCAI.

[14]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[15]  Craig Boutilier,et al.  RecSim: A Configurable Simulation Platform for Recommender Systems , 2019, ArXiv.

[16]  Hao Tian,et al.  Sequential Evaluation and Generation Framework for Combinatorial Recommender System , 2019, ArXiv.

[17]  Alfred Kobsa,et al.  Proceedings of the 8th ACM Conference on Recommender systems , 2014, RecSys 2014.

[18]  Guorui Zhou,et al.  Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.

[19]  Elad Eban,et al.  Seq2Slate: Re-ranking and Slate Optimization with RNNs , 2018, ArXiv.

[20]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[21]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[22]  Yu Gong,et al.  Exact-K Recommendation via Maximal Clique Optimization , 2019, KDD.

[23]  Sebastian Bruch,et al.  Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks , 2018, ICTIR.

[24]  W. Bruce Croft,et al.  Learning a Deep Listwise Context Model for Ranking Refinement , 2018, SIGIR.

[25]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[26]  Timothy A. Mann,et al.  Beyond Greedy Ranking: Slate Optimization via List-CVAE , 2018, ICLR.

[27]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[28]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[29]  Luc De Raedt,et al.  Proceedings of the 22nd international conference on Machine learning , 2005 .