A stochastic learning-to-rank algorithm and its application to contextual advertising

This paper is concerned with the problem of learning a model to rank objects (Web pages, ads and etc.). We propose a framework where the ranking model is both optimized and evaluated using the same information retrieval measures such as Normalized Discounted Cumulative Gain (NDCG) and Mean Average Precision (MAP). The main difficulty in direct optimization of NDCG and MAP is that these measures depend on the rank of objects and are not differentiable. Most learning-to-rank methods that attempt to optimize NDCG or MAP approximate such measures so that they can be differentiable. In this paper, we propose a simple yet effective stochastic optimization algorithm to directly minimize any loss function, which can be defined on NDCG or MAP for the learning-to-rank problem. The algorithm employs Simulated Annealing along with Simplex method for its parameter search and finds the global optimal parameters. Experiment results using NDCG-Annealing algorithm, an instance of the proposed algorithm, on LETOR benchmark data sets show that the proposed algorithm is both effective and stable when compared to the baselines provided in LETOR 3.0. In addition, we applied the algorithm for ranking ads in contextual advertising. Our method has shown to significantly improve relevance in offline evaluation and business metrics in online tests in a real large-scale advertising serving system. To scale our computations, we parallelize the algorithm in a MapReduce framework running on Hadoop.

[1]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[2]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[3]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[4]  Pável Calado,et al.  A combined component approach for finding collection-adapted ranking functions based on genetic programming , 2007, SIGIR.

[5]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[6]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[7]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[8]  Kathryn A. Dowsland,et al.  Using Simulated Annealing for Efficient Allocation of Students to Practical Classes , 1993 .

[9]  Vassilis Plachouras,et al.  A noisy-channel approach to contextual advertising , 2007, ADKDD '07.

[10]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[11]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[12]  Craig MacDonald,et al.  Learning to Select a Ranking Function , 2010, ECIR.

[13]  C. Reeves Modern heuristic techniques for combinatorial problems , 1993 .

[14]  Adwait Ratnaparkhi,et al.  Ranking for the conversion funnel , 2010, SIGIR '10.

[15]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[16]  Ramakant Nevatia,et al.  High performance object detection by collaborative learning of Joint Ranking of Granules features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Andrei Z. Broder,et al.  A semantic approach to contextual advertising , 2007, SIGIR.

[18]  Maksims Volkovs,et al.  BoltzRank: learning to maximize expected ranking gain , 2009, ICML '09.

[19]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[20]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[21]  Wei-Ying Ma,et al.  Object-level ranking: bringing order to Web objects , 2005, WWW '05.

[22]  Rainer Fuchs,et al.  Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters , 2001, Bioinform..

[23]  Alioune Ngom,et al.  A simulated annealing approach to find the optimal parameters for fuzzy clustering microarray data , 2005, XXV International Conference of the Chilean Computer Science Society (SCCC'05).

[24]  Adwait Ratnaparkhi A hidden class page-ad probability model for contextual advertising , 2010 .

[25]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[26]  Tong Zhang,et al.  Subset Ranking Using Regression , 2006, COLT.

[27]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[28]  Berthier A. Ribeiro-Neto,et al.  Impedance coupling in content-targeted advertising , 2005, SIGIR '05.

[29]  Thomas Hofmann,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, NIPS.

[30]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[31]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[32]  Daniel C. Fain,et al.  Predicting Click-Through Rate Using Keyword Clusters , 2006 .

[33]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[34]  Wei-Pang Yang,et al.  Learning to Rank for Information Retrieval Using Genetic Programming , 2007 .

[35]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[36]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[37]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[38]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[39]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[40]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[41]  Sanghamitra Bandyopadhyay,et al.  Simulated Annealing Based Pattern Classification , 1998, Inf. Sci..

[42]  Ramesh Nallapati,et al.  Discriminative models for information retrieval , 2004, SIGIR '04.

[43]  Deepayan Chakrabarti,et al.  Contextual advertising by combining relevance with click feedback , 2008, WWW.

[44]  Qiang Wu,et al.  McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.

[45]  Weiguo Fan,et al.  Learning to advertise , 2006, SIGIR.

[46]  Mingrui Wu,et al.  Gradient descent optimization of smoothed information retrieval metrics , 2010, Information Retrieval.

[47]  Tao Qin,et al.  FRank: a ranking method with fidelity loss , 2007, SIGIR.

[48]  Tao Qin,et al.  Learning to Search Web Pages with Query-Level Loss Functions , 2006 .

[49]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[50]  Rong Jin,et al.  Learning to Rank by Optimizing NDCG Measure , 2009, NIPS.