Privacy-Preserving Data Analytics

Real-time processing of user data streams in online services inadvertently creates tension between the users and analysts: users are looking for stronger privacy, while analysts desire for higher utility data analytics in real time. To resolve this tension, this paper describes the design, implementation and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three important properties: (i) Privacy: zero-knowledge privacy guarantee for users, a privacy bound tighter than the state-of-the-art differential privacy; (ii) Utility: an interface for data analysts to systematically explore the trade-offs between the output accuracy (with error estimation) and the query execution budget; (iii) Latency: near real-time stream processing based on a scalable “synchronization-free” distributed architecture. The key idea behind PRIVAPPROX is to combine two techniques together, namely, sampling (used for approximate computation) and randomized response (used for privacy-preserving analytics). The resulting combination is complementary — it achieves stronger privacy guarantees, and also improves the performance for stream analytics. Do Le Quoc TU Dresden, e-mail: do.le_quoc@tu-dresden.de Martin Beck TU Dresden, e-mail: martin.beck1@tu-dresden.de Pramod Bhatotia University of Edinburgh and Alan Turing Institute, e-mail: pramod.bhatotia@ed.ac.uk Ruichuan Chen Nokia Bell Labs, e-mail: ruichuan.chen@nokia-bell-labs.com Christof Fetzer TU Dresden, e-mail: christof.fetzer@tu-dresden.de Thorsten Strufe TU Dresden, e-mail: thorsten.strufe@tu-dresden.de

[1]  Christof Fetzer,et al.  StreamApprox: approximate computing for stream analytics , 2017, Middleware.

[2]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, ICALP.

[3]  Pramod Bhatotia,et al.  Orchestrating the Deployment of Computations in the Cloud with Conductor , 2012, NSDI.

[4]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[5]  Saikat Guha,et al.  Privad: Practical Privacy in Online Advertising , 2011, NSDI.

[6]  Saikat Guha,et al.  Koi: A Location-Privacy Platform for Smartphone Apps , 2012, NSDI.

[7]  Suman Nath,et al.  Differentially private aggregation of distributed time-series with transformation and encryption , 2010, SIGMOD Conference.

[8]  Pramod Bhatotia,et al.  Brief announcement: modelling MapReduce for optimal execution in the cloud , 2010, PODC.

[9]  Kamalika Chaudhuri,et al.  When Random Sampling Preserves Privacy , 2006, CRYPTO.

[10]  Elaine Shi,et al.  Privacy-Preserving Stream Aggregation with Fault Tolerance , 2012, Financial Cryptography.

[11]  U. N. Umesh,et al.  Randomized Response: A Method for Sensitive Surveys , 1986 .

[12]  Paul Francis,et al.  SplitX: high-performance private analytics , 2013, SIGCOMM.

[13]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[14]  R. Rodrigues,et al.  Conductor: orchestrating the clouds , 2010, LADIS '10.

[15]  Christof Fetzer,et al.  Approximate Stream Analytics in Apache Flink and Apache Spark Streaming , 2017, ArXiv.

[16]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[17]  Pramod Bhatotia,et al.  Incoop: MapReduce for incremental computations , 2011, SoCC.

[18]  Yan Zhang,et al.  RescueDP: Real-time spatio-temporal crowd-sourced data publishing with differential privacy , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[19]  Elaine Shi,et al.  Differentially Private Continual Monitoring of Heavy Hitters from Distributed Streams , 2012, Privacy Enhancing Technologies.

[20]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[21]  Ratul Mahajan,et al.  Differentially-private network trace analysis , 2010, SIGCOMM 2010.

[22]  Christof Fetzer,et al.  Privacy Preserving Stream Analytics: The Marriage of Randomized Response and Approximate Computing , 2017, ArXiv.

[23]  Andreas Haeberlen,et al.  DJoin: differentially private join queries over distributed databases , 2012, OSDI 2012.

[24]  Johannes Gehrke,et al.  Towards Privacy for Social Networks: A Zero-Knowledge Based Definition of Privacy , 2011, TCC.

[25]  Paul F. Syverson,et al.  Anonymous connections and onion routing , 1997, Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No.97CB36097).

[26]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[27]  Pramod Bhatotia,et al.  Slider: incremental sliding window analytics , 2014, Middleware.

[28]  Christof Fetzer,et al.  PrivApprox: Privacy-Preserving Stream Analytics , 2019, Informatik Spektrum.

[29]  David S. Moore,et al.  The Basic Practice of Statistics [With CDROM] , 1999 .

[30]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[31]  Minos N. Garofalakis,et al.  Approximate Query Processing: Taming the TeraBytes , 2001, VLDB.

[32]  Wenke Lee,et al.  xBook: Redesigning Privacy Control in Social Networking Platforms , 2009, USENIX Security Symposium.

[33]  Akshat Verma,et al.  Shredder: GPU-accelerated incremental storage and computation , 2012, FAST.

[34]  Pramod Bhatotia,et al.  Large-scale Incremental Data Processing with Change Propagation , 2011, HotCloud.

[35]  Assaf Schuster,et al.  Privacy-Preserving Distributed Stream Monitoring , 2014, NDSS.

[36]  Sharon Goldberg,et al.  Calibrating Data to Sensitivity in Private Data Analysis , 2012, Proc. VLDB Endow..

[37]  Pramod Bhatotia,et al.  Incremental parallel and distributed systems , 2015 .

[38]  Thu D. Nguyen,et al.  ApproxHadoop: Bringing Approximations to MapReduce Frameworks , 2015, ASPLOS.

[39]  Umut A. Acar,et al.  Slider : Incremental Sliding-Window Computations for Large-Scale Data Analysis , 2012 .

[40]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[41]  Paul Francis,et al.  Towards Statistical Queries over Distributed Private User Data , 2012, NSDI.

[42]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[43]  Byung Suk Lee,et al.  Stratified Reservoir Sampling over Heterogeneous Data Streams , 2010, SSDBM.

[44]  Stefan Saroiu,et al.  Keeping information safe from social networking apps , 2012, WOSN '12.

[45]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[46]  Elaine Shi,et al.  Privacy-Preserving Aggregation of Time-Series Data , 2011, NDSS.

[47]  Christof Fetzer,et al.  IncApprox: A Data Analytics System for Incremental Approximate Computing , 2016, WWW.

[48]  Johannes Gehrke,et al.  Crowd-Blending Privacy , 2012, IACR Cryptol. ePrint Arch..

[49]  Paul Francis,et al.  Non-tracking web analytics , 2012, CCS.

[50]  Surajit Chaudhuri,et al.  Optimized stratified sampling for approximate query processing , 2007, TODS.

[51]  Gang Wang,et al.  Poster: Defending against Sybil Devices in Crowdsourced Mapping Services , 2016, MobiSys '16 Companion.

[52]  Pramod Bhatotia,et al.  iThreads: A Threading Library for Parallel Incremental Computation , 2015, ASPLOS.

[53]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[54]  Elaine Shi,et al.  GUPT: privacy preserving data analysis made easy , 2012, SIGMOD Conference.

[55]  Suman Nath,et al.  Privacy-aware personalization for mobile advertising , 2012, CCS.

[56]  John R. Douceur,et al.  The Sybil Attack , 2002, IPTPS.

[57]  Vitaly Shmatikov,et al.  πBox: A Platform for Privacy-Preserving Apps , 2013 .

[58]  Andreas Haeberlen,et al.  Differential Privacy Under Fire , 2011, USENIX Security Symposium.