Clustering for Private Interest-based Advertising

We study the problem of designing privacy-enhanced solutions for interest-based advertisement (IBA). IBA is a key component of the online ads ecosystem and provides a better ad experience to users. Indeed, IBA enables advertisers to show users impressions that are relevant to them. Nevertheless, the current way ad tech companies achieve this is by building detailed interest profiles for individual users. In this work we ask whether such fine grained personalization is required, and present mechanisms that achieve competitive performance while giving privacy guarantees to the end users. More precisely we present the first detailed exploration of how to implement Chrome's Federated Learning of Cohorts (FLoC) API. We define the privacy properties required for the API and evaluate multiple hashing and clustering algorithms discussing the trade-offs between utility, privacy, and ease of implementation.

[1]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[2]  Ana Rodríguez,et al.  Online advertising: analysis of privacy threats and protection approaches , 2016 .

[3]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[4]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[5]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[6]  Ola Svensson,et al.  Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[7]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[8]  Amitai Armon,et al.  On min-max r-gatherings , 2011, Theor. Comput. Sci..

[9]  Shi Li,et al.  Approximating k-median via pseudo-approximation , 2012, STOC '13.

[10]  Vahab Mirrokni,et al.  Massively Parallel and Dynamic Algorithms for Minimum Size Clustering , 2021, SODA.

[11]  Nir Ailon,et al.  Streaming k-means approximation , 2009, NIPS.

[12]  Saikat Guha,et al.  Privad: Practical Privacy in Online Advertising , 2011, NSDI.

[13]  Fionn Murtagh,et al.  Algorithms for hierarchical clustering: an overview , 2012, WIREs Data Mining Knowl. Discov..

[14]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[15]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[16]  Gustavo Malkomes,et al.  Fast Distributed k-Center Clustering with Outliers on Massive Data , 2015, NIPS.

[17]  Sanjoy Dasgupta,et al.  A cost function for similarity-based hierarchical clustering , 2015, STOC.

[18]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[19]  Przemyslaw Kazienko,et al.  AdROSA - Adaptive personalization of web advertising , 2007, Inf. Sci..

[20]  Helen Nissenbaum,et al.  Adnostic: Privacy Preserving Targeted Advertising , 2010, NDSS.

[21]  Ari Juels,et al.  Targeted Advertising ... And Privacy Too , 2001, CT-RSA.

[22]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[23]  Ping Li,et al.  In Defense of Minhash over Simhash , 2014, AISTATS.

[24]  Rajeev Motwani,et al.  Approximation Algorithms for k-Anonymity , 2005 .

[25]  Christo Wilson,et al.  Diffusion of User Tracking Data in the Online Advertising Ecosystem , 2018, Proc. Priv. Enhancing Technol..

[26]  Aditya Bhaskara,et al.  Distributed Balanced Clustering via Mapping Coresets , 2014, NIPS.

[27]  Bryan Perozzi,et al.  Grale: Designing Networks for Graph Learning , 2020, KDD.

[28]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[29]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[30]  A KonstanJoseph,et al.  The MovieLens Datasets , 2015 .

[31]  Silvio Lattanzi,et al.  Affinity Clustering: Hierarchical Clustering at Scale , 2017, NIPS.

[32]  Pranjal Awasthi,et al.  Fair k-Center Clustering for Data Summarization , 2019, ICML.

[33]  Liudmila Ostroumova,et al.  Learning Clusters through Information Diffusion , 2019, WWW.

[34]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[35]  Jana Schmidt,et al.  Interpreting PET scans by structured patient data: a data mining case study in dementia research , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[36]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[37]  Helen Nissenbaum,et al.  Engineering Privacy and Protest: A Case Study of AdNauseam , 2017, IWPE@SP.

[38]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[39]  Ming-Hsuan Yang,et al.  Locality preserving hashing , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[40]  Samir Khuller,et al.  On Finding Dense Subgraphs , 2009, ICALP.

[41]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[42]  Vahab S. Mirrokni,et al.  Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time , 2021, ICML.

[43]  Andreas Krause,et al.  Distributed and Provably Good Seedings for k-Means in Constant Rounds , 2017, ICML.

[44]  Patrik D'haeseleer,et al.  How does gene expression clustering work? , 2005, Nature Biotechnology.

[45]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[46]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.