Multiparty Reach and Frequency Histogram: Private, Secure, and Practical

Abstract Consider the setting where multiple parties each hold a multiset of users and the task is to estimate the reach (i.e., the number of distinct users appearing across all parties) and the frequency histogram (i.e., fraction of users appearing a given number of times across all parties). In this work we introduce a new sketch for this task, based on an exponentially distributed counting Bloom filter. We combine this sketch with a communication-efficient multi-party protocol to solve the task in the multi-worker setting. Our protocol exhibits both differential privacy and security guarantees in the honest-but-curious model and in the presence of large subsets of colluding workers; furthermore, its reach and frequency histogram estimates have a provably small error. Finally, we show the practicality of the protocol by evaluating it on internet-scale audiences.

[1]  Rolf Egert,et al.  Privately Computing Set-Union and Set-Intersection Cardinality via Bloom Filters , 2015, ACISP.

[2]  C. Estan,et al.  New directions in traffic measurement and accounting , 2002, SIGCOMM.

[3]  Tracking Audience Statistics with HyperLogLog , 2021 .

[4]  Graham Cormode,et al.  Algorithms for distributed functional monitoring , 2008, SODA '08.

[5]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[6]  Úlfar Erlingsson,et al.  Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity , 2018, SODA.

[7]  Rafail Ostrovsky,et al.  Cryptography from Anonymity , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[8]  Rasmus Pagh,et al.  Efficient Differentially Private F0 Linear Sketching , 2020, ArXiv.

[9]  Yun William Yu,et al.  Balancing Accuracy and Privacy in Federated Queries of Clinical Data Repositories: Algorithm Development and Validation , 2020, Journal of medical Internet research.

[10]  Salil P. Vadhan,et al.  The Complexity of Differential Privacy , 2017, Tutorials on the Foundations of Cryptography.

[11]  Mauro Conti,et al.  A Survey on Homomorphic Encryption Schemes: Theory and Implementation , 2017 .

[12]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[13]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[14]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[15]  Alexander Hall,et al.  HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm , 2013, EDBT '13.

[16]  Mathieu Cunche,et al.  Privacy-preserving Wi-Fi Analytics , 2018, Proc. Priv. Enhancing Technol..

[17]  Vladimir Kolesnikov,et al.  A Pragmatic Introduction to Secure Multi-Party Computation , 2019, Found. Trends Priv. Secur..

[18]  Aleksandar Nikolov,et al.  Pan-private algorithms via statistics on sketches , 2011, PODS.

[19]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[20]  Elaine Shi,et al.  Optimal Lower Bound for Differentially Private Multi-party Aggregation , 2012, ESA.

[21]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[22]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[23]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[24]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[25]  P. Flajolet,et al.  HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .

[26]  Jieming Mao,et al.  Connecting Robust Shuffle Privacy and Pan-Privacy , 2020, SODA.

[27]  David A. Basin,et al.  Cardinality Estimators do not Preserve Privacy , 2018, Proc. Priv. Enhancing Technol..

[28]  John M. Abowd,et al.  The U.S. Census Bureau Adopts Differential Privacy , 2018, KDD.

[29]  David P. Woodruff,et al.  An optimal algorithm for the distinct elements problem , 2010, PODS '10.

[30]  Moti Yung,et al.  On Deploying Secure Computing: Private Intersection-Sum-with-Cardinality , 2020, 2020 IEEE European Symposium on Security and Privacy (EuroS&P).

[31]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[32]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[33]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[34]  Sasu Tarkoma,et al.  Theory and Practice of Bloom Filters for Distributed Systems , 2012, IEEE Communications Surveys & Tutorials.

[35]  Philippe Flajolet,et al.  Loglog Counting of Large Cardinalities (Extended Abstract) , 2003, ESA.

[36]  Eran Omri,et al.  Distributed Private Data Analysis: On Simultaneously Solving How and What , 2008, CRYPTO.

[37]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[38]  Adam D. Smith,et al.  Distributed Differential Privacy via Shuffling , 2018, IACR Cryptol. ePrint Arch..

[39]  David P. Woodruff,et al.  An Optimal Lower Bound for Distinct Elements in the Message Passing Model , 2014, SODA.

[40]  Omer Reingold,et al.  Computational Differential Privacy , 2009, CRYPTO.

[41]  Felix Naumann,et al.  Cardinality Estimation: An Experimental Survey , 2017, Proc. VLDB Endow..

[42]  G. Oehlert A note on the delta method , 1992 .

[43]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[44]  Badih Ghazi,et al.  On Distributed Differential Privacy and Counting Distinct Elements , 2020, ITCS.

[45]  George Varghese,et al.  Bitmap Algorithms for Counting Active Flows on High-Speed Links , 2003, IEEE/ACM Transactions on Networking.

[46]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[47]  Dan Boneh,et al.  The Decision Diffie-Hellman Problem , 1998, ANTS.

[48]  Arkady Yerukhimovich,et al.  Differentially-Private Multi-Party Sketching for Large-Scale Statistics , 2020, IACR Cryptol. ePrint Arch..

[49]  Stephen C. Pohlig,et al.  An Improved Algorithm for Computing Logarithms over GF(p) and Its Cryptographic Significance , 2022, IEEE Trans. Inf. Theory.

[50]  Ninghui Li,et al.  Locally Differentially Private Protocols for Frequency Estimation , 2017, USENIX Security Symposium.

[51]  Dan Boneh,et al.  Prio: Private, Robust, and Scalable Computation of Aggregate Statistics , 2017, NSDI.

[52]  Anne-Marie Kermarrec,et al.  BLIP: Non-interactive Differentially-Private Similarity Computation on Bloom filters , 2012, SSS.

[53]  Joshua Brody,et al.  Beyond set disjointness: the communication complexity of finding the intersection , 2014, PODC '14.

[54]  Peter J. Haas,et al.  On synopses for distinct-value estimation under multiset operations , 2007, SIGMOD '07.

[55]  Ananda Theertha Suresh Differentially private anonymized histograms , 2019, NeurIPS.

[56]  Mauro Conti,et al.  A Survey on Homomorphic Encryption Schemes , 2017, ACM Comput. Surv..

[57]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[58]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..