Efficient Private Statistics with Succinct Sketches

Large-scale collection of contextual information is often essential in order to gather statistics, train machine learning models, and extract knowledge from data. The ability to do so in a {\em privacy-preserving} way -- i.e., without collecting fine-grained user data -- enables a number of additional computational scenarios that would be hard, or outright impossible, to realize without strong privacy guarantees. In this paper, we present the design and implementation of practical techniques for privately gathering statistics from large data streams. We build on efficient cryptographic protocols for private aggregation and on data structures for succinct data representation, namely, Count-Min Sketch and Count Sketch. These allow us to reduce the communication and computation complexity incurred by each data source (e.g., end-users) from linear to logarithmic in the size of their input, while introducing a parametrized upper-bounded error that does not compromise the quality of the statistics. We then show how to use our techniques, efficiently, to instantiate real-world privacy-friendly systems, supporting recommendations for media streaming services, prediction of user locations, and computation of median statistics for Tor hidden services.

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[3]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[4]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[5]  Paul Resnick,et al.  Recommender systems , 1997, CACM.

[6]  Josh Benaloh,et al.  Dense Probabilistic Encryption , 1999 .

[7]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[8]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9]  David Chaum,et al.  Untraceable electronic mail, return addresses, and digital pseudonyms , 1981, CACM.

[10]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[11]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[12]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[13]  C. Castelluccia,et al.  Efficient aggregation of encrypted data in wireless sensor networks , 2005, The Second Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services.

[14]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[16]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[17]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[18]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[19]  Yang Zhang,et al.  CarTel: a distributed mobile sensor computing system , 2006, SenSys '06.

[20]  Sahin Albayrak,et al.  An agent-based approach for privacy-preserving recommender systems , 2007, AAMAS '07.

[21]  Tarek F. Abdelzaher,et al.  PoolView: stream privacy for grassroots participatory sensing , 2008, SenSys '08.

[22]  Minho Shin,et al.  Anonysense: privacy-aware people-centric sensing , 2008, MobiSys '08.

[23]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[24]  Wen Hu,et al.  Towards trustworthy participatory sensing , 2009 .

[25]  Philippe Golle,et al.  On the Anonymity of Home/Work Location Pairs , 2009, Pervasive.

[26]  Athina Markopoulou,et al.  Predictive Blacklisting as an Implicit Recommendation System , 2009, 2010 Proceedings IEEE INFOCOM.

[27]  Wen Hu,et al.  Preserving privacy in participatory sensing systems , 2010, Comput. Commun..

[28]  David Wetherall,et al.  Toward trustworthy mobile sensing , 2010, HotMobile '10.

[29]  Dan Boneh,et al.  Location Privacy via Private Proximity Testing , 2011, NDSS.

[30]  Cecilia Mascolo,et al.  NextPlace: A Spatio-temporal Prediction Framework for Pervasive Systems , 2011, Pervasive.

[31]  Elaine Shi,et al.  Privacy-Preserving Aggregation of Time-Series Data , 2011, NDSS.

[32]  George Danezis,et al.  Privacy-Friendly Aggregation for the Smart-Grid , 2011, PETS.

[33]  Aleksandar Nikolov,et al.  Pan-private algorithms via statistics on sketches , 2011, PODS.

[34]  Emilia Käsper Fast Elliptic Curve Cryptography in OpenSSL , 2011, Financial Cryptography Workshops.

[35]  Nicholas Hopper,et al.  Efficient Private Proximity Testing with GSM Location Sketches , 2012, Financial Cryptography.

[36]  Paul Francis,et al.  Towards Statistical Queries over Distributed Private User Data , 2012, NSDI.

[37]  Paul Francis,et al.  Non-tracking web analytics , 2012, CCS.

[38]  Florian Kerschbaum,et al.  Fault-Tolerant Privacy-Preserving Statistics , 2012, Privacy Enhancing Technologies.

[39]  Zekeriya Erkin,et al.  Private Computation of Spatial and Temporal Power Consumption with Smart Meters , 2012, ACNS.

[40]  Elaine Shi,et al.  Privacy-Preserving Stream Aggregation with Fault Tolerance , 2012, Financial Cryptography.

[41]  Sanjeev Khanna,et al.  Distributed Private Heavy Hitters , 2012, ICALP.

[42]  Tanja Lange,et al.  High-speed high-security signatures , 2011, Journal of Cryptographic Engineering.

[43]  Vinod Vaikuntanathan,et al.  On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption , 2012, STOC '12.

[44]  Elaine Shi,et al.  Differentially Private Continual Monitoring of Heavy Hitters from Distributed Streams , 2012, Privacy Enhancing Technologies.

[45]  Divesh Srivastava,et al.  Differentially private summaries for sparse data , 2012, ICDT '12.

[46]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[47]  Claudio Soriente,et al.  Extended Capabilities for a Privacy-Enhanced Participatory Sensing Infrastructure (PEPSI) , 2013, IEEE Transactions on Information Forensics and Security.

[48]  Stratis Ioannidis,et al.  Privacy-preserving matrix factorization , 2013, CCS.

[49]  Paul Francis,et al.  SplitX: high-performance private analytics , 2013, SIGCOMM.

[50]  Wendy Hui Wang,et al.  Privacy-Preserving Distributed Movement Data Aggregation , 2013, AGILE Conf..

[51]  Joan Feigenbaum,et al.  Reuse It Or Lose It: More Efficient Secure Computation Through Reuse of Encrypted Values , 2014, CCS.

[52]  Michael Naehrig,et al.  Private Predictive Analysis on Encrypted Medical Data , 2014, IACR Cryptol. ePrint Arch..

[53]  Ravi Mukkamala,et al.  A Scalable and Efficient Privacy Preserving Global Itemset Support Approximation Using Bloom Filters , 2014, DBSec.

[54]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[55]  Emiliano De Cristofaro,et al.  What's the Gist? Privacy-Preserving Aggregation of User Profiles , 2014, ESORICS.

[56]  Emiliano De Cristofaro,et al.  Fast and Private Genomic Testing for Disease Susceptibility , 2014, WPES.

[57]  George Danezis,et al.  PrivEx: Private Collection of Traffic Statistics for Anonymous Communication Networks , 2014, CCS.

[58]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[59]  Karsten Loesing,et al.  Hidden-service statistics reported by relays , 2015 .

[60]  Mirco Musolesi,et al.  Anticipatory Mobile Computing , 2013, ACM Comput. Surv..

[61]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[62]  Rolf Egert,et al.  Privately Computing Set-Union and Set-Intersection Cardinality via Bloom Filters , 2015, ACISP.

[63]  Emiliano De Cristofaro,et al.  Controlled Data Sharing for Collaborative Predictive Blacklisting , 2015, DIMVA.

[64]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2016, J. Priv. Confidentiality.