Hidden Conditional Random Fields with Deep User Embeddings for Ad Targeting

Estimating a user's propensity to click on a display ad or purchase a particular item is a critical task in targeted advertising, a burgeoning online industry worth billions of dollars. Better and more accurate estimation methods result in improved online user experience, as only relevant and interesting ads are shown, and may also lead to large benefits for advertisers, as targeted users are more likely to click or make a purchase. In this paper we address this important problem, and propose an approach for improved estimation of ad click or conversion probability based on a sequence of user's online actions, modeled using Hidden Conditional Random Fields (HCRF) model. In addition, in order to address the sparsity issue at the input side of the HCRF model, we propose to learn distributed, low-dimensional representations of user actions through a directed skip-gram, a neural architecture suitable for sequential data. Experimental results on a real-world data set comprising thousands of user sessions collected at Yahoo servers clearly indicate the benefits and the potential of the proposed approach, which outperformed competing state-of-the-art algorithms and obtained significant improvements in terms of retrieval measures.

[1]  R. Zemel,et al.  Multiscale conditional random fields for image labeling , 2004, CVPR 2004.

[2]  Zoran Obradovic,et al.  Continuous Conditional Random Fields for Regression in Remote Sensing , 2010, ECAI.

[3]  David S. Evans The Online Advertising Industry: Economics, Evolution, and Privacy , 2009 .

[4]  Wen Zhang,et al.  How much can behavioral targeting help online advertising? , 2009, WWW '09.

[5]  Yelong Shen,et al.  Sparse hidden-dynamics conditional random fields for user intent understanding , 2011, WWW.

[6]  Tie-Yan Liu,et al.  Psychological advertising: exploring user psychology for click prediction in sponsored search , 2013, KDD.

[7]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[8]  Brian D. Davison,et al.  Predicting Sequences of User Actions , 1998 .

[9]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[10]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[11]  Trevor Darrell,et al.  Hidden-state Conditional Random Fields , 2006 .

[12]  Mark Levene,et al.  Evaluating Variable-Length Markov Chain Models for Analysis of User Web Navigation Sessions , 2007, IEEE Transactions on Knowledge and Data Engineering.

[13]  Daniel C. Fain,et al.  Sponsored search: A brief history , 2006 .

[14]  Ravi Kumar,et al.  Are web users really Markovian? , 2012, WWW.

[15]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[16]  Andrii Cherniak,et al.  Session modeling to predict online buyer behavior , 2013, DUBMOD '13.

[17]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Ye Chen,et al.  Position-normalized click prediction in search advertising , 2012, KDD.

[19]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[20]  Thierry Artières,et al.  Neural conditional random fields , 2010, AISTATS.

[21]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[22]  J. Zico Kolter,et al.  Sparse Gaussian Conditional Random Fields: Algorithms, Theory, and Application to Energy Forecasting , 2013, ICML.

[23]  Vanja Josifovski,et al.  Web-scale user modeling for targeting , 2012, WWW.

[24]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[25]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[26]  Peter Pirolli,et al.  Distributions of surfers' paths through the World Wide Web: Empirical characterizations , 1999, World Wide Web.

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  Erick Cantú-Paz,et al.  Personalized click prediction in sponsored search , 2010, WSDM '10.

[29]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[30]  Zoran Obradovic,et al.  Continuous Conditional Random Fields for Efficient Regression in Large Fully Connected Graphs , 2013, AAAI.

[31]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[32]  Alexander J. Smola,et al.  Scalable distributed inference of dynamic user interests for behavioral targeting , 2011, KDD.

[33]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[34]  Charles L. A. Clarke,et al.  In the Mood to Click? Towards Inferring Receptiveness to Search Advertising , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[35]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[36]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Andrei Z. Broder,et al.  Computational advertising and recommender systems , 2008, RecSys '08.