Privacy-aware regression modeling of participatory sensing data

Many participatory sensing applications use data collected by participants to construct a public model of a system or phenomenon. For example, a health application might compute a model relating exercise and diet to amount of weight loss. While the ultimately computed model could be public, the individual input and output data traces used to construct it may be private data of participants (e.g., their individual food intake, lifestyle choices, and resulting weight). This paper proposes and experimentally studies a technique that attempts to keep such input and output data traces private, while allowing accurate model construction. This is significantly different from perturbation-based techniques in that no noise is added. The main contribution of the paper is to show a certain data transformation at the client side that helps keeping the client data private while not introducing any additional error to model construction. We particularly focus on linear regression models which are widely used in participatory sensing applications. We use the data set from a map-based participatory sensing service to evaluate our scheme. The service in question is a green navigation service that constructs regression models from participant data to predict the fuel consumption of vehicles on road segments. We evaluate our proposed mechanism by providing empirical evidence that: i) an individual data trace is generally hard to reconstruct with any reasonable accuracy, and ii) the regression model constructed using the transformed traces has a much smaller error than one based on additive data-perturbation schemes.

[1]  M. Hansen,et al.  Participatory Sensing , 2019, Internet of Things.

[2]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[3]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[4]  Sartaj Sahni,et al.  Computationally Related Problems , 1974, SIAM J. Comput..

[5]  Ashwin Machanavajjhala,et al.  Privacy-Preserving Data Publishing , 2009, Found. Trends Databases.

[6]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[7]  Tarek F. Abdelzaher,et al.  PoolView: stream privacy for grassroots participatory sensing , 2008, SenSys '08.

[8]  Tarek F. Abdelzaher,et al.  GreenGPS: a participatory sensing fuel-efficient maps application , 2010, MobiSys '10.

[9]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[10]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[11]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[12]  Yang Zhang,et al.  CarTel: a distributed mobile sensor computing system , 2006, SenSys '06.

[13]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14]  Alexandre V. Evfimievski,et al.  Randomization in privacy preserving data mining , 2002, SKDD.

[15]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[16]  Leonidas J. Guibas,et al.  Mobiscopes for Human Spaces , 2007, IEEE Pervasive Computing.

[17]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[18]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[19]  Ruth Brand,et al.  Microdata Protection through Noise Addition , 2002, Inference Control in Statistical Databases.

[20]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[21]  Andrew Chi-Chih Yao,et al.  Protocols for Secure Computations (Extended Abstract) , 1982, FOCS.

[22]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[23]  Chong K. Liew,et al.  A data distortion by probability distribution , 1985, TODS.

[24]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[25]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[26]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[27]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[28]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[29]  Suman Nath,et al.  Differentially private aggregation of distributed time-series with transformation and encryption , 2010, SIGMOD Conference.

[30]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[31]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[32]  Sushil Jajodia,et al.  Secure Data Management in Decentralized Systems , 2014, Secure Data Management in Decentralized Systems.

[33]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[34]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[35]  Benny Pinkas,et al.  Cryptographic techniques for privacy-preserving data mining , 2002, SKDD.

[36]  Josep Domingo-Ferrer,et al.  Inference Control in Statistical Databases, From Theory to Practice , 2002 .

[37]  Suman Nath,et al.  Privacy-Preserving Reconstruction of Multidimensional Data Maps in Vehicular Participatory Sensing , 2010, EWSN.

[38]  W. Winkler,et al.  MASKING MICRODATA FILES , 1995 .

[39]  Emiliano Miluzzo,et al.  The BikeNet mobile sensing system for cyclist experience mapping , 2007, SenSys '07.

[40]  C. Castelluccia,et al.  Efficient aggregation of encrypted data in wireless sensor networks , 2005, The Second Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services.

[41]  Benjamin C. M. Fung,et al.  Privacy-preserving data publishing , 2007 .

[42]  Jaideep Vaidya,et al.  Privacy preserving association rule mining in vertically partitioned data , 2002, KDD.

[43]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[44]  Xiaodong Lin,et al.  Secure Regression on Distributed Databases , 2005 .

[45]  Wenliang Du,et al.  Secure multi-party computation problems and their applications: a review and open problems , 2001, NSPW '01.