Score Look-Alike Audiences

Look-alike models, which are efficient tools for finding similar users from a smaller user set, are quickly revolutionizing the online programmatic advertising industry. The datasets in these contexts exhibit extremely sparse feature spaces on a massive scale, so traditionally, the state-of-the-art look-alike models have used pairwise similarities to construct these similar user sets. One of the key challenges of the similarity-based models is that they do not provide a way to measure the potential value of the users to an advertiser, which is crucial in an advertising context. We propose methods to score users within the expanded audience in a way which relates directly to the business metric that the advertiser wants to optimize. We present three scoring models and show that, through empirical evaluation using real-world, large-scale data, by incorporating the potential value of a user to an advertiser into our scoring model, we can significantly improve the performance of the look-alike models over methods which only use pairwise similarities of users.

[1]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[2]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[3]  Deepayan Chakrabarti,et al.  Contextual advertising by combining relevance with click feedback , 2008, WWW.

[4]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[6]  Jeremy Buhler,et al.  Large-Scale Sequence Comparison by Locality-Sensitive Hashing , 2001 .

[7]  Piotr Indyk,et al.  Scalable Techniques for Clustering the Web , 2000, WebDB.

[8]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[9]  Felix Naumann,et al.  The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[10]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[11]  Martial Hebert,et al.  Rapid object indexing using locality sensitive hashing and joint 3D-signature space estimation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Reynold Xin,et al.  Scaling Spark in the Real World: Performance and Usability , 2015, Proc. VLDB Endow..

[13]  Sreenivas Gollapudi,et al.  Exploiting asymmetry in hierarchical topic extraction , 2006, CIKM '06.

[14]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[15]  Jeremy Buhler,et al.  Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..

[16]  Vikram Pudi,et al.  A feature-pair-based associative classification approach to look-alike modeling for conversion-oriented user-targeting in tail campaigns , 2011, WWW.

[17]  Yang Lu,et al.  Big data analytics and big data science: a survey , 2016 .

[18]  M. Slaney,et al.  Locality-Sensitive Hashing for Finding Nearest Neighbors [Lecture Notes] , 2008, IEEE Signal Processing Magazine.

[19]  Datong Chen,et al.  A Sub-linear, Massive-scale Look-alike Audience Extension System A Massive-scale Look-alike Audience Extension , 2016, BigMine.

[20]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[21]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[22]  Jianqiang Shen,et al.  Effective Audience Extension in Online Advertising , 2015, KDD.

[23]  Hongxia Yang,et al.  Estimating rates of rare events through a multidimensional dynamic hierarchical Bayesian framework , 2016 .