论文信息 - Score Look-Alike Audiences - 字舞流文

Score Look-Alike Audiences

Look-alike models, which are efficient tools for finding similar users from a smaller user set, are quickly revolutionizing the online programmatic advertising industry. The datasets in these contexts exhibit extremely sparse feature spaces on a massive scale, so traditionally, the state-of-the-art look-alike models have used pairwise similarities to construct these similar user sets. One of the key challenges of the similarity-based models is that they do not provide a way to measure the potential value of the users to an advertiser, which is crucial in an advertising context. We propose methods to score users within the expanded audience in a way which relates directly to the business metric that the advertiser wants to optimize. We present three scoring models and show that, through empirical evaluation using real-world, large-scale data, by incorporating the potential value of a user to an advertiser into our scoring model, we can significantly improve the performance of the look-alike models over methods which only use pairwise similarities of users.

Datong Chen | Qiang Ma | Zhen Xia | Róbert Ormándi | Jiayi Wen | Eeshan Wagh | Róbert Ormándi | Qiang Ma | Datong Chen | Eeshan Wagh | Jiayi Wen | Zhen Xia

[1] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[2] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[3] Deepayan Chakrabarti,et al. Contextual advertising by combining relevance with click feedback , 2008, WWW.

[4] Kristen Grauman,et al. Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5] Prashant Malik,et al. Cassandra: a decentralized structured storage system , 2010, OPSR.

[6] Jeremy Buhler,et al. Large-Scale Sequence Comparison by Locality-Sensitive Hashing , 2001 .

[7] Piotr Indyk,et al. Scalable Techniques for Clustering the Web , 2000, WebDB.

[8] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[9] Felix Naumann,et al. The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[10] Din J. Wasem,et al. Mining of Massive Datasets , 2014 .

[11] Martial Hebert,et al. Rapid object indexing using locality sensitive hashing and joint 3D-signature space estimation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Reynold Xin,et al. Scaling Spark in the Real World: Performance and Usability , 2015, Proc. VLDB Endow..

[13] Sreenivas Gollapudi,et al. Exploiting asymmetry in hierarchical topic extraction , 2006, CIKM '06.

[14] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[15] Jeremy Buhler,et al. Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..

[16] Vikram Pudi,et al. A feature-pair-based associative classification approach to look-alike modeling for conversion-oriented user-targeting in tail campaigns , 2011, WWW.

[17] Yang Lu,et al. Big data analytics and big data science: a survey , 2016 .

[18] M. Slaney,et al. Locality-Sensitive Hashing for Finding Nearest Neighbors [Lecture Notes] , 2008, IEEE Signal Processing Magazine.

[19] Datong Chen,et al. A Sub-linear, Massive-scale Look-alike Audience Extension System A Massive-scale Look-alike Audience Extension , 2016, BigMine.

[20] N. B. Anuar,et al. The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[21] Andrew Zisserman,et al. Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[22] Jianqiang Shen,et al. Effective Audience Extension in Online Advertising , 2015, KDD.

[23] Hongxia Yang,et al. Estimating rates of rare events through a multidimensional dynamic hierarchical Bayesian framework , 2016 .