Interval Privacy: A Framework for Data Collection

The emerging public awareness and government regulations of data privacy motivate new paradigms of collecting and analyzing data transparent and acceptable to data owners. We present a new concept of privacy and corresponding data formats, mechanisms, and tradeoffs for privatizing data during data collection. The privacy, named Interval Privacy, enforces the raw data conditional distribution on the privatized data to be the same as its unconditional distribution over a nontrivial support set. Correspondingly, the proposed privacy mechanism will record each data value as a random interval containing it. The proposed interval privacy mechanisms can be easily deployed through most existing survey-based data collection paradigms, e.g., by asking a respondent whether its data value is within a randomly generated range. Another unique feature of interval mechanisms is that they obfuscate the truth but not distort it. The way of using narrowed range to convey information is complementary to the popular paradigm of perturbing data. Also, the interval mechanisms can generate progressively refined information at the discretion of individual respondents. We study different theoretical aspects of the proposed privacy. In the context of supervised learning, we also offer a method such that existing supervised learning algorithms designed for point-valued data could be directly applied to learning from interval-valued data.

[1]  Aaron Roth,et al.  Gaussian differential privacy , 2019, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[2]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[3]  Georgios B. Giannakis,et al.  Sensor-Centric Data Reduction for Estimation With WSNs via Censoring and Quantization , 2012, IEEE Transactions on Signal Processing.

[4]  Jianguo Sun,et al.  Interval Censoring , 2003 .

[5]  M. Traugott,et al.  Web survey design and administration. , 2001, Public opinion quarterly.

[6]  Meng Sun,et al.  On the Relationship Between Inference and Data Privacy in Decentralized IoT Networks , 2018, IEEE Transactions on Information Forensics and Security.

[7]  Yuejie Chi,et al.  Quantized Spectral Compressed Sensing: Cramer–Rao Bounds and Recovery Algorithms , 2017, IEEE Transactions on Signal Processing.

[8]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[9]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[10]  Jie Ding,et al.  Assisted Learning: A Framework for Multi-Organization Learning , 2020, NeurIPS.

[11]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[12]  Georgios B. Giannakis,et al.  Online Censoring for Large-Scale Regressions with Application to Streaming Big Data , 2015, IEEE Transactions on Signal Processing.

[13]  Flávio du Pin Calmon,et al.  Privacy against statistical inference , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.

[15]  Andrea Cavallaro,et al.  Privacy as a Feature for Body-Worn Cameras [In the Spotlight] , 2020, IEEE Signal Process. Mag..

[16]  Jie Ding,et al.  Model Selection Techniques: An Overview , 2018, IEEE Signal Processing Magazine.

[17]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[18]  P. Groeneboom,et al.  Asymptotically optimal estimation of smooth functionals for interval censoring, case $2$ , 1999 .

[19]  Paul Voigt,et al.  The EU General Data Protection Regulation (GDPR) , 2017 .

[20]  Meng Sun,et al.  Decentralized Detection With Robust Information Privacy Protection , 2018, IEEE Transactions on Information Forensics and Security.

[21]  D. Park The Statistical Analysis of Interval-Censored Failure Time Data , 2007 .

[22]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[23]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[24]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[25]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[26]  Anand D. Sarwate,et al.  A rate-disortion perspective on local differential privacy , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[28]  Thomas Steinke,et al.  New Oracle-Efficient Algorithms for Private Synthetic Data Release , 2020, ICML.

[29]  Frederik Armknecht,et al.  A Guide to Fully Homomorphic Encryption , 2015, IACR Cryptol. ePrint Arch..

[30]  Erik Skau,et al.  Fusing Heterogeneous Data: A Case for Remote Sensing and Social Media , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[31]  Zhiwei Steven Wu,et al.  Private Post-GAN Boosting , 2020, ICLR.

[32]  David Chaum,et al.  Multiparty unconditionally secure protocols , 1988, STOC '88.

[33]  Rickmer Braren,et al.  Secure, privacy-preserving and federated machine learning in medical imaging , 2020, Nature Machine Intelligence.

[34]  B. Turnbull The Empirical Distribution Function with Arbitrarily Grouped, Censored, and Truncated Data , 1976 .

[35]  Mohssen Alabbadi Mobile Learning (mLearning) Based on Cloud Computing: mLearning as a Service (mLaaS) , 2011 .

[36]  Miriam A. M. Capretz,et al.  MLaaS: Machine Learning as a Service , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[37]  Richard G. Baraniuk,et al.  1-Bit compressive sensing , 2008, 2008 42nd Annual Conference on Information Sciences and Systems.

[38]  J. Wellner,et al.  Information Bounds and Nonparametric Maximum Likelihood Estimation , 1992 .

[39]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[40]  Xin He,et al.  Towards Information Privacy for the Internet of Things , 2016, ArXiv.