Context-Aware Learning to Rank with Self-Attention

Learning to rank is a key component of many e-commerce search engines. In learning to rank, one is interested in optimising the global ordering of a list of items according to their utility for users.Popular approaches learn a scoring function that scores items individually (i.e. without the context of other items in the list) by optimising a pointwise, pairwise or listwise loss. The list is then sorted in the descending order of the scores. Possible interactions between items present in the same list are taken into account in the training phase at the loss level. However, during inference, items are scored individually, and possible interactions between them are not considered. In this paper, we propose a context-aware neural network model that learns item scores by applying a self-attention mechanism. The relevance of a given item is thus determined in the context of all other items present in the list, both in training and in inference. We empirically demonstrate significant performance gains of self-attention based neural architecture over Multi-LayerPerceptron baselines, in particular on a dataset coming from search logs of a large scale e-commerce marketplace, this http URL. This effect is consistent across popular pointwise, pairwise and listwise losses.Finally, we report new state-of-the-art results on MSLR-WEB30K, the learning to rank benchmark.

[1]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[2]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[3]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[4]  Thomas Hofmann,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, NIPS.

[5]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[7]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[8]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[9]  Saratchandra Indrakanti,et al.  Exploring the Effect of an Item's Neighborhood on its Sellability in eCommerce , 2019, ArXiv.

[10]  Dan Pei,et al.  Personalized re-ranking for recommendation , 2019, RecSys.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[13]  Elad Eban,et al.  Seq2Slate: Re-ranking and Slate Optimization with RNNs , 2018, ArXiv.

[14]  Sebastian Bruch,et al.  Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks , 2018, ICTIR.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[17]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[18]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[19]  Gang Hua,et al.  Ordinal Regression with Multiple Output CNN for Age Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[21]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[25]  Cheng Li,et al.  The LambdaLoss Framework for Ranking Metric Optimization , 2018, CIKM.

[26]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[27]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[28]  W. Bruce Croft,et al.  Learning a Deep Listwise Context Model for Ranking Refinement , 2018, SIGIR.

[29]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[30]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[31]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[32]  Timothy A. Mann,et al.  Beyond Greedy Ranking: Slate Optimization via List-CVAE , 2018, ICLR.

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.