Orchard: Differentially Private Analytics at Scale

This paper presents Orchard, a system that can answer queries about sensitive data that is held by millions of user devices, with strong differential privacy guarantees. Orchard combines high accuracy with good scalability, and it uses only a single untrusted party to facilitate the query. Moreover, whereas previous solutions that shared these properties were custombuilt for specific queries, Orchard is general and can accept a wide range of queries. Orchard accomplishes this by rewriting queries into a distributed protocol that can be executed efficiently at scale, using cryptographic primitives. Our prototype of Orchard can execute 14 out of 17 queries chosen from the literature; to our knowledge, no other system can handle more than one of them in this setting. And the costs are moderate: each user device typically needs only a few megabytes of traffic and a few minutes of computation time. Orchard also includes a novel defense against malicious users who attempt to distort the results of a query.

[1]  Elaine Shi,et al.  Privacy-Preserving Aggregation of Time-Series Data , 2011, NDSS.

[2]  Adam D. Smith,et al.  Turning HATE Into LOVE: Homomorphic Ad Hoc Threshold Encryption for Scalable MPC , 2018, IACR Cryptol. ePrint Arch..

[3]  Benjamin C. Pierce,et al.  Distance makes the types grow stronger: a calculus for differential privacy , 2010, ICFP '10.

[4]  Andreas Haeberlen,et al.  Fuzzi: a three-level logic for differential privacy , 2019, Proc. ACM Program. Lang..

[5]  Andreas Haeberlen,et al.  DStress: Efficient Differentially Private Computations on Distributed Data , 2017, EuroSys.

[6]  Abhi Shelat,et al.  Billion-Gate Secure Computation with Malicious Adversaries , 2012, USENIX Security Symposium.

[7]  Li Xiong,et al.  A Comprehensive Comparison of Multiparty Secure Additions with Differential Privacy , 2017, IEEE Transactions on Dependable and Secure Computing.

[8]  Yehuda Lindell,et al.  Secure Computation on the Web: Computing without Simultaneous Interaction , 2011, IACR Cryptol. ePrint Arch..

[9]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[10]  Alexander Nilsson,et al.  A Survey of Published Attacks on Intel SGX , 2020, ArXiv.

[11]  W. Marsden I and J , 2012 .

[12]  David Lie,et al.  Glimmers: Resolving the Privacy/Trust Quagmire , 2017, HotOS.

[13]  Stratis Ioannidis,et al.  Privacy-Preserving Ridge Regression on Hundreds of Millions of Records , 2013, 2013 IEEE Symposium on Security and Privacy.

[14]  Xiaoyu Cao,et al.  Data Poisoning Attacks to Local Differential Privacy Protocols , 2019, USENIX Security Symposium.

[15]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[16]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[17]  Andreas Haeberlen,et al.  Linear dependent types for differential privacy , 2013, POPL.

[18]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[19]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[20]  Sarvar Patel,et al.  Practical Secure Aggregation for Federated Learning on User-Held Data , 2016, ArXiv.

[21]  Shin-ya Katsumata,et al.  Probabilistic Relational Reasoning via Metrics , 2018, 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS).

[22]  Moti Yung,et al.  Differentially-Private "Draw and Discard" Machine Learning , 2018, ArXiv.

[23]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[24]  George Danezis,et al.  Smart meter aggregation via secret-sharing , 2013, SEGS '13.

[25]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[26]  Jonathan Katz,et al.  Global-Scale Secure Multiparty Computation , 2017, CCS.

[27]  Elaine Shi,et al.  Privacy-Preserving Stream Aggregation with Fault Tolerance , 2012, Financial Cryptography.

[28]  Ninghui Li,et al.  Understanding Hierarchical Methods for Differentially Private Histograms , 2013, Proc. VLDB Endow..

[29]  Arthur Azevedo de Amorim,et al.  Really Natural Linear Indexed Type Checking , 2014, IFL.

[30]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[31]  Úlfar Erlingsson,et al.  Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity , 2018, SODA.

[32]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[33]  Jun Tang,et al.  Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12 , 2017, ArXiv.

[34]  Paul Francis,et al.  Towards Statistical Queries over Distributed Private User Data , 2012, NSDI.

[35]  Yin Yang,et al.  Differentially private histogram publication , 2012, The VLDB Journal.

[36]  Adam J. Lee,et al.  Secured histories: computing group statistics on encrypted data while preserving individual privacy , 2010, ArXiv.

[37]  H. Brendan McMahan,et al.  Federated Heavy Hitters Discovery with Differential Privacy , 2019, AISTATS.

[38]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[39]  Aws Albarghouthi,et al.  Synthesizing coupling proofs of differential privacy , 2017, Proc. ACM Program. Lang..

[40]  Xenofontas A. Dimitropoulos,et al.  SEPIA: Privacy-Preserving Aggregation of Multi-Domain Network Events and Statistics , 2010, USENIX Security Symposium.

[41]  Simon Rogers,et al.  A First Course in Machine Learning , 2011, Chapman and Hall / CRC machine learning and pattern recognition series.

[42]  Mariana Raykova,et al.  Privacy-Preserving Distributed Linear Regression on High-Dimensional Data , 2017, Proc. Priv. Enhancing Technol..

[43]  Refik Molva,et al.  PUDA - Privacy and Unforgeability for Data Aggregation , 2015, CANS.

[44]  Xing Xie,et al.  PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions , 2016, SIGMOD Conference.

[45]  Craig Gentry,et al.  Pinocchio: Nearly Practical Verifiable Computation , 2013, 2013 IEEE Symposium on Security and Privacy.

[46]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[47]  Andreas Haeberlen,et al.  Differential Privacy: An Economic Method for Choosing Epsilon , 2014, 2014 IEEE 27th Computer Security Foundations Symposium.

[48]  Ratul Mahajan,et al.  Differentially-private network trace analysis , 2010, SIGCOMM '10.

[49]  Danfeng Zhang,et al.  LightDP: towards automating differential privacy proofs , 2016, POPL.

[50]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[51]  Andreas Haeberlen,et al.  Differential Privacy Under Fire , 2011, USENIX Security Symposium.

[52]  Dan Boneh,et al.  Prio: Private, Robust, and Scalable Computation of Aggregate Statistics , 2017, NSDI.

[53]  Alessandro Rudi,et al.  Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes , 2018, NeurIPS.

[54]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[55]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[56]  Jonathan Ullman,et al.  Manipulation Attacks in Local Differential Privacy , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[57]  Claude Castelluccia,et al.  I Have a DREAM! (DiffeRentially privatE smArt Metering) , 2011, Information Hiding.

[58]  Zhicong Huang,et al.  UnLynx: A Decentralized System for Privacy-Conscious Data Sharing , 2017, Proc. Priv. Enhancing Technol..

[59]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[60]  Gaurav Kapoor,et al.  Protection Against Reconstruction and Its Applications in Private Federated Learning , 2018, ArXiv.

[61]  Mu Zhang,et al.  Duet: an expressive higher-order language and linear type system for statically enforcing differential privacy , 2019, Proc. ACM Program. Lang..

[62]  Úlfar Erlingsson,et al.  Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[63]  Suman Nath,et al.  Differentially private aggregation of distributed time-series with transformation and encryption , 2010, SIGMOD Conference.

[64]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[65]  Gilles Barthe,et al.  Higher-Order Approximate Relational Refinement Types for Mechanism Design and Differential Privacy , 2014, POPL.

[66]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[67]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[68]  Ersin Uzun,et al.  Achieving Differential Privacy in Secure Multiparty Data Aggregation Protocols on Star Networks , 2017, CODASPY.

[69]  Aaron Roth,et al.  Differentially private combinatorial optimization , 2009, SODA '10.

[70]  Aaron Roth,et al.  Iterative Constructions and Private Data Release , 2011, TCC.

[71]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[72]  Martin R. Albrecht,et al.  On the concrete hardness of Learning with Errors , 2015, J. Math. Cryptol..

[73]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[74]  Andreas Haeberlen,et al.  Honeycrisp: large-scale differentially private aggregation without a trusted core , 2019, SOSP.

[75]  Eli Ben-Sasson,et al.  Succinct Non-Interactive Zero Knowledge for a von Neumann Architecture , 2014, USENIX Security Symposium.

[76]  Rui Zhang,et al.  A Hybrid Approach to Privacy-Preserving Federated Learning , 2018, Informatik Spektrum.

[77]  Marc Joye,et al.  A Scalable Scheme for Privacy-Preserving Aggregation of Time-Series Data , 2013, Financial Cryptography.

[78]  Chris Peikert,et al.  On Ideal Lattices and Learning with Errors over Rings , 2010, JACM.