A study of event frequency profiling with differential privacy

Program profiling is widely used to measure run-time execution properties---for example, the frequency of method and statement execution. Such profiling could be applied to deployed software to gain performance insights about the behavior of many instances of the analyzed software. However, such data gathering raises privacy concerns: for example, it reveals whether (and how often) a software user accesses a particular software functionality. There is growing interest in adding privacy protections for many categories of data analyses, but such techniques have not been studied sufficiently for program event profiling. We propose the design of privacy-preserving event frequency profiling for deployed software. Each instance of the targeted software gathers its own event frequency profile and then randomizes it. The resulting noisy data has well-defined privacy properties, characterized via the powerful machinery of differential privacy. After gathering this data from many software instances, the profiling infrastructure computes estimates of population-wide frequencies while adjusting for the effects of the randomization. The approach employs static analysis to determine constraints that must hold in all valid run-time profiles, and uses quadratic programming to reduce the error of the estimates under these constraints. Our experiments study different choices for randomization and the resulting effects on the accuracy of frequency estimates. Our conclusion is that well-designed solutions can achieve both high accuracy and principled privacy-by-design for the fundamental problem of event frequency profiling.

[1]  Loris D'Antoni,et al.  Control-flow recovery from partial failure reports , 2017, PLDI.

[2]  Jong-Deok Choi,et al.  Accurate, efficient, and adaptive calling context profiling , 2006, PLDI '06.

[3]  Catuscia Palamidessi,et al.  Broadening the Scope of Differential Privacy Using Metrics , 2013, Privacy Enhancing Technologies.

[4]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  Boris Škorić,et al.  Improving Frequency Estimation under Local Differential Privacy , 2019, WPES@CCS.

[6]  Atanas Rountev,et al.  Introducing Privacy in Screen Event Frequency Analysis for Android Apps , 2019, 2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[7]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[8]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[9]  Alessandro Orso,et al.  A Technique for Enabling and Supporting Debugging of Field Failures , 2007, 29th International Conference on Software Engineering (ICSE'07).

[10]  Samuel Z. Guyer,et al.  Elephant tracks: portable production of complete and precise gc traces , 2013, ISMM '13.

[11]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[12]  H. Storkel Learning New Words , 2001 .

[13]  Kobbi Nissim,et al.  Clustering Algorithms for the Centralized and Local Models , 2017, ALT.

[14]  Matthew Arnold,et al.  A Survey of Adaptive Optimization in Virtual Machines , 2005, Proceedings of the IEEE.

[15]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[16]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[17]  Toshiaki Yasue,et al.  A dynamic optimization framework for a Java just-in-time compiler , 2001, OOPSLA '01.

[18]  Wei-Ying Ma,et al.  Automated known problem diagnosis with event traces , 2006, EuroSys.

[19]  Ninghui Li,et al.  Locally Differentially Private Protocols for Frequency Estimation , 2017, USENIX Security Symposium.

[20]  Yin Yang,et al.  Collecting and Analyzing Data from Smart Device Users with Local Differential Privacy , 2016, ArXiv.

[21]  Madeline Diep,et al.  Profiling deployed software: assessing strategies and testing opportunities , 2005, IEEE Transactions on Software Engineering.

[22]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[23]  Ninghui Li,et al.  Locally Differentially Private Frequent Itemset Mining , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[24]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[25]  Alessandro Orso,et al.  BugRedux: Reproducing field failures for in-house debugging , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[26]  Thomas Steinke,et al.  Differential Privacy: A Primer for a Non-Technical Audience , 2018 .

[27]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[28]  Úlfar Erlingsson,et al.  Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[29]  Uri Stemmer,et al.  Heavy Hitters and the Structure of Local Privacy , 2017, PODS.

[30]  Raef Bassily,et al.  Practical Locally Private Heavy Hitters , 2017, NIPS.

[31]  Michael D. Bond,et al.  Continuous path and edge profiling , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[32]  Jong-Deok Choi,et al.  Finding and Removing Performance Bottlenecks in Large Systems , 2004, ECOOP.

[33]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[34]  Michael D. Bond,et al.  Breadcrumbs: efficient context sensitivity for dynamic bug detection analyses , 2010, PLDI '10.

[35]  Sebastian G. Elbaum,et al.  An empirical study of profiling strategies for released software and their impact on testing activities , 2004, ISSTA '04.

[36]  Pankaj Dhoolia,et al.  Distributed program tracing , 2013, ESEC/FSE 2013.

[37]  Dongmei Zhang,et al.  Performance debugging in the large via mining millions of stack traces , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[38]  Chandra Krintz,et al.  Efficient remote profiling for resource-constrained devices , 2006, TACO.

[39]  Yin Yang,et al.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy , 2016, CCS.

[40]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[41]  Huaimin Wang,et al.  Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems , 2013, IEEE Transactions on Parallel and Distributed Systems.

[42]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[43]  James R. Larus,et al.  Optimally profiling and tracing programs , 1994, TOPL.

[44]  Adam D. Smith,et al.  Is Interaction Necessary for Distributed Private Learning? , 2017, 2017 IEEE Symposium on Security and Privacy (SP).