Coupled Group Lasso for Web-Scale CTR Prediction in Display Advertising

In display advertising, click through rate (CTR) prediction is the problem of estimating the probability that an advertisement (ad) is clicked when displayed to a user in a specific context. Due to its easy implementation and promising performance, logistic regression (LR) model has been widely used for CTR prediction, especially in industrial systems. However, it is not easy for LR to capture the nonlinear information, such as the conjunction information, from user features and ad features. In this paper, we propose a novel model, called coupled group lasso (CGL), for CTR prediction in display advertising. CGL can seamlessly integrate the conjunction information from user features and ad features for modeling. Furthermore, CGL can automatically eliminate useless features for both users and ads, which may facilitate fast online prediction. Scalability of CGL is ensured through feature hashing and distributed implementation. Experimental results on real-world data sets show that our CGL model can achieve state-of-the-art performance on webscale CTR prediction tasks.

[1]  S. Muthukrishnan,et al.  Ad Exchanges: Research Issues , 2009, WINE.

[2]  Gene H. Golub,et al.  Tikhonov Regularization and Total Least Squares , 1999, SIAM J. Matrix Anal. Appl..

[3]  Sachin Garg,et al.  Response prediction using collaborative filtering with hierarchies and side-information , 2011, KDD.

[4]  Rajiv Khanna,et al.  Estimating rates of rare events with multiple hierarchies through scalable log-linear models , 2010, KDD '10.

[5]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[6]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[7]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[8]  Rómer Rosales,et al.  Simple and Scalable Response Prediction for Display Advertising , 2014, ACM Trans. Intell. Syst. Technol..

[9]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[10]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[11]  Thore Graepel,et al.  WWW 2009 MADRID! Track: Data Mining / Session: Statistical Methods Matchbox: Large Scale Online Bayesian Recommendations , 2022 .

[12]  Mohammad Mahdian,et al.  Pay-per-action Model for Online Advertising , 2007, WINE.

[13]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[14]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[17]  Jorge Nocedal,et al.  Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..

[18]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[19]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .

[20]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[21]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[22]  Wentong Li,et al.  Estimating conversion rate in display advertising from past erformance data , 2012, KDD.