Privacy Preserving Stream Analytics: The Marriage of Randomized Response and Approximate Computing

How to preserve users' privacy while supporting high-utility analytics for low-latency stream processing? To answer this question: we describe the design, implementation, and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three properties: (i) Privacy: zero-knowledge privacy guarantees for users, a privacy bound tighter than the state-of-the-art differential privacy; (ii) Utility: an interface for data analysts to systematically explore the trade-offs between the output accuracy (with error-estimation) and query execution budget; (iii) Latency: near real-time stream processing based on a scalable "synchronization-free" distributed architecture. The key idea behind our approach is to marry two existing techniques together: namely, sampling (used in the context of approximate computing) and randomized response (used in the context of privacy-preserving analytics). The resulting marriage is complementary - it achieves stronger privacy guarantees and also improves performance, a necessary ingredient for achieving low-latency stream analytics.

[1]  Krishna P. Gummadi,et al.  Policy-Sealed Data: A New Abstraction for Building Trusted Cloud Services , 2012, USENIX Security Symposium.

[2]  Sofya Raskhodnikova,et al.  Private analysis of graph structure , 2011, Proc. VLDB Endow..

[3]  Christof Fetzer,et al.  IncApprox: A Data Analytics System for Incremental Approximate Computing , 2016, WWW.

[4]  Elaine Shi,et al.  Privacy-Preserving Aggregation of Time-Series Data , 2011, NDSS.

[5]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[6]  Chris Jermaine,et al.  Online aggregation for large MapReduce jobs , 2011, Proc. VLDB Endow..

[7]  Gaétan Hains,et al.  A resource prediction model for virtualization servers , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).

[8]  Surajit Chaudhuri,et al.  Optimized stratified sampling for approximate query processing , 2007, TODS.

[9]  Pramod Bhatotia,et al.  Slider: incremental sliding window analytics , 2014, Middleware.

[10]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[11]  Srikanth Kandula,et al.  Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters , 2016, SIGMOD Conference.

[12]  Paul Francis,et al.  Non-tracking web analytics , 2012, CCS.

[13]  Gang Wang,et al.  Poster: Defending against Sybil Devices in Crowdsourced Mapping Services , 2016, MobiSys '16 Companion.

[14]  Pramod Bhatotia,et al.  iThreads: A Threading Library for Parallel Incremental Computation , 2015, ASPLOS.

[15]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[16]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.

[17]  Byung Suk Lee,et al.  Stratified Reservoir Sampling over Heterogeneous Data Streams , 2010, SSDBM.

[18]  Holger Ziekow,et al.  The DEBS 2015 grand challenge , 2015, DEBS.

[19]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[20]  Latifur Khan,et al.  Facing the reality of data stream classification: coping with scarcity of labeled data , 2012, Knowledge and Information Systems.

[21]  M. Ruiz Espejo Sampling , 2013, Encyclopedic Dictionary of Archaeology.

[22]  Archana Ganapathi,et al.  Predicting and Optimizing System Utilization and Performance via Statistical Machine Learning , 2009 .

[23]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[24]  Stefan Saroiu,et al.  Keeping information safe from social networking apps , 2012, WOSN '12.

[25]  Johannes Gehrke,et al.  Towards Privacy for Social Networks: A Zero-Knowledge Based Definition of Privacy , 2011, TCC.

[26]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[27]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[28]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[29]  Hitesh Ballani,et al.  End-to-end Performance Isolation Through Virtual Datacenters , 2014, OSDI.

[30]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[31]  Anja Vogler,et al.  Randomized Response A Method For Sensitive Surveys , 2016 .

[32]  Tal Malkin,et al.  Multi-party Computation of Polynomials and Branching Programs without Simultaneous Interaction , 2013, EUROCRYPT.

[33]  Suman Nath,et al.  Privacy-aware personalization for mobile advertising , 2012, CCS.

[34]  Pramod Bhatotia,et al.  Incoop: MapReduce for incremental computations , 2011, SoCC.

[35]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, TSEC.

[36]  Klemens Böhm,et al.  Proceedings of the International Conference on Very Large Data Bases , 2005 .

[37]  Yan Zhang,et al.  RescueDP: Real-time spatio-temporal crowd-sourced data publishing with differential privacy , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[38]  Elaine Shi,et al.  Differentially Private Continual Monitoring of Heavy Hitters from Distributed Streams , 2012, Privacy Enhancing Technologies.

[39]  Wenke Lee,et al.  xBook: Redesigning Privacy Control in Social Networking Platforms , 2009, USENIX Security Symposium.

[40]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[41]  Elaine Shi,et al.  GUPT: privacy preserving data analysis made easy , 2012, SIGMOD Conference.

[42]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[43]  Saikat Guha,et al.  Privad: Practical Privacy in Online Advertising , 2011, NSDI.

[44]  Srinath T. V. Setty,et al.  Scalable and Private Media Consumption with Popcorn , 2016, NSDI.

[45]  Minos N. Garofalakis,et al.  Approximate Query Processing: Taming the TeraBytes , 2001, VLDB.

[46]  Paul F. Syverson,et al.  Anonymous connections and onion routing , 1998, IEEE J. Sel. Areas Commun..

[47]  Akshat Verma,et al.  Shredder: GPU-accelerated incremental storage and computation , 2012, FAST.

[48]  David P. Woodruff Revisiting the Efficiency of Malicious Two-Party Computation , 2007, EUROCRYPT.

[49]  Ratul Mahajan,et al.  Differentially-private network trace analysis , 2010, SIGCOMM '10.

[50]  Lucas Waye,et al.  Privacy integrated data stream queries , 2014, PSP '14.

[51]  Chris Clifton,et al.  How Much Is Enough? Choosing ε for Differential Privacy , 2011, ISC.

[52]  Suman Nath,et al.  Differentially private aggregation of distributed time-series with transformation and encryption , 2010, SIGMOD Conference.

[53]  Pramod Bhatotia,et al.  Brief announcement: modelling MapReduce for optimal execution in the cloud , 2010, PODC.

[54]  Charles Reiss,et al.  Towards understanding heterogeneous clouds at scale : Google trace analysis , 2012 .

[55]  Pramod Bhatotia,et al.  Incremental parallel and distributed systems , 2015 .

[56]  Andreas Haeberlen,et al.  DJoin: differentially private join queries over distributed databases , 2012, OSDI 2012.

[57]  Umut A. Acar,et al.  Slider : Incremental Sliding-Window Computations for Large-Scale Data Analysis , 2012 .

[58]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[59]  Peter Bajorski,et al.  Wiley Series in Probability and Statistics , 2010 .

[60]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[61]  Saikat Guha,et al.  Koi: A Location-Privacy Platform for Smartphone Apps , 2012, NSDI.

[62]  Kamalika Chaudhuri,et al.  When Random Sampling Preserves Privacy , 2006, CRYPTO.

[63]  Elaine Shi,et al.  Privacy-Preserving Stream Aggregation with Fault Tolerance , 2012, Financial Cryptography.

[64]  D. S. Moore,et al.  The Basic Practice of Statistics , 2001 .

[65]  Christof Fetzer,et al.  StreamApprox: approximate computing for stream analytics , 2017, Middleware.

[66]  Pramod Bhatotia,et al.  Orchestrating the Deployment of Computations in the Cloud with Conductor , 2012, NSDI.

[67]  Ameet Talwalkar,et al.  Knowing when you're wrong: building fast and reliable approximate query processing systems , 2014, SIGMOD Conference.

[68]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[69]  Ion Stoica,et al.  G-OLA: Generalized On-Line Aggregation for Interactive Analysis on Big Data , 2015, SIGMOD Conference.

[70]  Silvio Micali,et al.  A Completeness Theorem for Protocols with Honest Majority , 1987, STOC 1987.

[71]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[72]  Paul Francis,et al.  Towards Statistical Queries over Distributed Private User Data , 2012, NSDI.

[73]  Pramod Bhatotia,et al.  Large-scale Incremental Data Processing with Change Propagation , 2011, HotCloud.

[74]  Yehuda Lindell,et al.  An Efficient Protocol for Secure Two-Party Computation in the Presence of Malicious Adversaries , 2007, Journal of Cryptology.

[75]  Assaf Schuster,et al.  Privacy-Preserving Distributed Stream Monitoring , 2014, NDSS.

[76]  Sharon Goldberg,et al.  Calibrating Data to Sensitivity in Private Data Analysis , 2012, Proc. VLDB Endow..

[77]  Thu D. Nguyen,et al.  ApproxHadoop: Bringing Approximations to MapReduce Frameworks , 2015, ASPLOS.

[78]  Benny Pinkas,et al.  Secure Two-Party Computation is Practical , 2009, IACR Cryptol. ePrint Arch..

[79]  John R. Douceur,et al.  The Sybil Attack , 2002, IPTPS.

[80]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[81]  Vitaly Shmatikov,et al.  πBox: A Platform for Privacy-Preserving Apps , 2013 .

[82]  Andreas Haeberlen,et al.  Differential Privacy Under Fire , 2011, USENIX Security Symposium.

[83]  R. Rodrigues,et al.  Conductor: orchestrating the clouds , 2010, LADIS '10.

[84]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[85]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[86]  Christof Fetzer,et al.  PrivApprox: Privacy-Preserving Stream Analytics , 2019, Informatik Spektrum.

[87]  O. Pons Bootstrap of means under stratified sampling , 2007, 0709.3246.

[88]  Valentin Tudor,et al.  BES: Differentially Private and Distributed Event Aggregation in Advanced Metering Infrastructures , 2016, CPSS@AsiaCCS.

[89]  Vitaly Shmatikov,et al.  Efficient Two-Party Secure Computation on Committed Inputs , 2007, EUROCRYPT.