Leveraging Blogging Activity on Tumblr to Infer Demographics and Interests of Users for Advertising Purposes

As one of the leading platforms for creative content, Tumblr offers advertisers a unique way of creating brand identity. Advertisers can tell their story through images, animation, text, music, video and more, and they can promote that content by sponsoring it to appear as an advertisement in the streams of Tumblr users. In this paper, we present a framework that enabled one of the key targeted advertising components for Tumblr, specifically, gender and interest targeting. We describe the main challenges involved in the development of the framework, which include the creation of a ground truth for training gender prediction models, as well as mapping Tumblr content to an interest taxonomy. For purposes of inferring user interests, we propose a novel semi-supervised neural language model for categorization of Tumblr content (i.e., post tags and post keywords). The model was trained on a large-scale data set consisting of 6.8 billion user posts, with a very limited amount of categorized keywords, and was shown to have superior performance over the baseline models. We successfully deployed gender and interest targeting capability in Yahoo production systems, delivering inference for users that covers more than 90% of the daily activities on Tumblr. Online performance results indicate advantages of the proposed approach, where we observed a 20% increase in user engagement with sponsored posts in comparison to untargeted campaigns.

[1]  Ramnath K. Chellappa,et al.  Personalization versus Privacy: An Empirical Examination of the Online Consumer’s Dilemma , 2005, Inf. Technol. Manag..

[2]  Suleyman Cetintas,et al.  Recommending Tumblr Blogs to Follow with Inductive Matrix Completion , 2014, RecSys Posters.

[3]  Udi Manber,et al.  Experience with personalization of Yahoo! , 2000, CACM.

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Andrei Z. Broder,et al.  Computational advertising and recommender systems , 2008, RecSys '08.

[6]  Tanuja Singh,et al.  Blogging: A new play in your marketing game plan , 2008 .

[7]  Evgeniy Gabrilovich,et al.  Retrieval models for audience selection in display advertising , 2011, CIKM '11.

[8]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[9]  Yan Liu,et al.  What is Tumblr: a statistical overview and comparison , 2014, SKDD.

[10]  John G. Lynch,et al.  Interactive Home Shopping: Consumer, Retailer, and Manufacturer Incentives to Participate in Electronic Marketplaces , 1997 .

[11]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[12]  Fabrizio Silvestri,et al.  Efficient query recommendations in the long tail via center-piece subgraphs , 2012, SIGIR '12.

[13]  Alexander J. Smola,et al.  Scalable distributed inference of dynamic user interests for behavioral targeting , 2011, KDD.

[14]  David Essex,et al.  Matchmaker, matchmaker , 2009, CACM.

[15]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[16]  Nisheeth Shrivastava,et al.  Know your personalization: learning topic level personalization in online services , 2012, WWW.

[17]  Nicola Barbieri,et al.  Who to follow and why: link prediction with explanations , 2014, KDD.

[18]  Doug Riecken,et al.  Introduction: personalized views of personalization , 2000, CACM.