Deep Learning over Multi-field Categorical Data - - A Case Study on User Response Prediction

Predicting user responses, such as click-through rate and conversion rate, are critical in many web applications including web search, personalised recommendation, and online advertising. Different from continuous raw features that we usually found in the image and audio domains, the input features in web space are always of multi-field and are mostly discrete and categorical while their dependencies are little known. Major user response prediction models have to either limit themselves to linear models or require manually building up high-order combination features. The former loses the ability of exploring feature interactions, while the latter results in a heavy computation in the large feature space. To tackle the issue, we propose two novel models using deep neural networks (DNNs) to automatically learn effective patterns from categorical feature interactions and make predictions of users' ad clicks. To get our DNNs efficiently work, we propose to leverage three feature transformation methods, i.e., factorisation machines (FMs), restricted Boltzmann machines (RBMs) and denoising auto-encoders (DAEs). This paper presents the structure of our models and their efficient training algorithms. The large-scale experiments with real-world data demonstrate that our methods work better than major state-of-the-art models.

[1]  David A. Elizondo,et al.  A Survey of Partially Connected Neural Networks , 1997, Int. J. Neural Syst..

[2]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Lutz Prechelt,et al.  Automatic early stopping using cross validation: quantifying the criteria , 1998, Neural Networks.

[4]  Joseph E. Beck,et al.  High-Level Student Modeling with Machine Learning , 2000, Intelligent Tutoring Systems.

[5]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[6]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[11]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[12]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[13]  Andrei Z. Broder Computational advertising , 2008, SODA '08.

[14]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[15]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[16]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[17]  Xuerui Wang,et al.  Click-Through Rate Estimation for Rare Events in Online Advertising , 2011 .

[18]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[19]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[20]  Ilya Trofimov,et al.  Using boosted trees for click-through rate prediction for sponsored search , 2012, ADKDD '12.

[21]  Wentong Li,et al.  Estimating conversion rate in display advertising from past erformance data , 2012, KDD.

[22]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Steffen Rendle,et al.  Factorization Machines with libFM , 2012, TIST.

[25]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[26]  Li Deng,et al.  A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[29]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[30]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[31]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[32]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[33]  Xuehua Shen,et al.  iPinYou Global RTB Bidding Algorithm Competition Dataset , 2014, ADKDD'14.

[34]  David Lo,et al.  Predicting response in mobile advertising with hierarchical importance-aware factorization machine , 2014, WSDM.

[35]  Tomoharu Iwata,et al.  Probabilistic latent network visualization: inferring and embedding diffusion networks , 2014, KDD.

[36]  Yi Li,et al.  Mariana: Tencent Deep Learning Platform and its Applications , 2014, Proc. VLDB Endow..

[37]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[38]  Weinan Zhang,et al.  Optimal real-time bidding for display advertising , 2014, KDD.

[39]  Joaquin Quiñonero Candela,et al.  Practical Lessons from Predicting Clicks on Ads at Facebook , 2014, ADKDD'14.

[40]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[41]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..