Large Scale CVR Prediction through Dynamic Transfer Learning of Global and Local Features

This paper presents a combination of strategies for conversion rate (CVR) prediction deployed at the Yahoo! demand side platform (DSP) Brightroll, targeting at modeling extremely high dimensional, sparse data with limited human intervention. We propose a novel probabilistic generative model by tightly integrating components of natural language processing, dynamic transfer learning and scalable prediction, named Dynamic Transfer Learning with Reinforced W ord M odeling (a.k.a. Trans-RWM ) to predict user conversion rates. Our model is based on assumptions that: on a higher level, information can be transferable between related campaigns; on a lower level, users who searched similar contents or browsed similar pages would have a higher probability of sharing similar latent purchase interests. Novelties of this framework include (i) A novel natural language modeling specifically tailored for semantic inputs of CVR prediction; (ii) A Bayesian transfer learning model to dynamically transfer the knowledge from source to the future target ; (iii) An automatic new updating rule with adaptive regularization using Stochastic Gradient Monte Carlo to support the efficient updating of Trans-RWM in high-dimensional and sparse data. We demonstrate that on Brightroll our framework can effectively discriminate extremely rare events in terms of their conversion propensity.