Neural Choice by Elimination via Highway Networks

We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing out the random Gompertz latent utilities. Coupled with the choice model is the recently introduced Neural Highway Networks for approximating arbitrarily complex rank functions. We evaluate the proposed framework on a large-scale public dataset with over 425K items, drawn from the Yahoo! learning to rank challenge. It is demonstrated that the proposed method is competitive against state-of-the-art learning to rank methods.

[1]  Jianfeng Gao,et al.  Deep stacking networks for information retrieval , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[4]  R. Plackett The Analysis of Permutations , 1975 .

[5]  I. C. Gormley,et al.  A mixture of experts model for rank data with applications in election studies , 2008, 0901.4203.

[6]  Qiang Wu,et al.  Learning to Rank Using an Ensemble of Lambda-Gradient Models , 2010, Yahoo! Learning to Rank Challenge.

[7]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[10]  R. J. Henery,et al.  Permutation Probabilities as Models for Horse Races , 1981 .

[11]  J. Yellott The relationship between Luce's Choice Axiom, Thurstone's Theory of Comparative Judgment, and the double exponential distribution , 1977 .

[12]  Svetha Venkatesh,et al.  A Sequential Decision Approach to Ordinal Preferences in Recommender Systems , 2012, AAAI.

[13]  A. Tversky Elimination by aspects: A theory of choice. , 1972 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  L. Thurstone A law of comparative judgment. , 1994 .

[16]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[17]  David C. Parkes,et al.  Random Utility Theory for Social Choice , 2012, NIPS.

[18]  James Fan,et al.  Learning to rank for robust question answering , 2012, CIKM.

[19]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[20]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[21]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[22]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[23]  J. Friedman Stochastic gradient boosting , 2002 .

[24]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[25]  Svetha Venkatesh,et al.  Thurstonian Boltzmann Machines: Learning from Multiple Inequalities , 2013, ICML.

[26]  Wei Liu,et al.  RankCNN: When learning to rank encounters the pseudo preference feedback , 2014, Comput. Stand. Interfaces.

[27]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[28]  Qiang Wu,et al.  McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.

[29]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[30]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[31]  Yang Song,et al.  Adapting deep RankNet for personalized search , 2014, WSDM.

[32]  Svetha Venkatesh,et al.  Probabilistic Models over Ordered Partitions with Applications in Document Ranking and Collaborative Filtering , 2011, SDM.

[33]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[34]  Jingfeng Lu,et al.  “Reverse” nested lottery contests☆ , 2014 .

[35]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[36]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .