Feature Selection in Click-Through Rate Prediction Based on Gradient Boosting

Click-Through Rate (CTR) prediction is one of the key techniques in computational advertising. At present, CTR prediction is commonly conducted by linear models combined with \(L_1\) regularization, which is based on previous feature engineering including feature normalization and cross combination. In this case, the model cannot realize automatic feature learning. This paper uses the ensemble method for reference and proposes a feature selection algorithm based on gradient boosting. The algorithm employs the methods of Gradient Boosting Decision Tree (GBDT) and Logistic Regression (LR), and further conducts a positive analysis in the data set of kaggle-CTR prediction on display ads. The experimental result verifies the feasibility and validity of feature selection method. Moreover, it improves the performance of CTR prediction model, whose AUC value reaches 0.908.

[1]  Rómer Rosales,et al.  Simple and Scalable Response Prediction for Display Advertising , 2014, ACM Trans. Intell. Syst. Technol..

[2]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[3]  Liu Wei-yi,et al.  Click-through rate prediction of online advertisements based on probabilistic graphical model , 2013 .

[4]  Hui Xiong,et al.  Introduction to special section on intelligent mobile knowledge discovery and management systems , 2013, ACM Trans. Intell. Syst. Technol..

[5]  Ye Chen,et al.  Position-normalized click prediction in search advertising , 2012, KDD.

[6]  Deepayan Chakrabarti,et al.  Contextual advertising by combining relevance with click feedback , 2008, WWW.

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  Chia-Hua Ho,et al.  An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[9]  Michael I. Jordan,et al.  Estimation, Optimization, and Parallelism when Data is Sparse , 2013, NIPS.

[10]  R. Real,et al.  AUC: a misleading measure of the performance of predictive distribution models , 2008 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Qiang Yang,et al.  Personalized click model through collaborative filtering , 2012, WSDM '12.

[13]  Tony R. Martinez,et al.  Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous , 2008, 2008 Seventh International Conference on Machine Learning and Applications.