PURE: Blind Regression Modeling for Low Quality Data with Participatory Sensing

Participatory regression modeling is a cost-efficient mechanism to establish the relationships among multiple dimensions of sensory data collected from volunteers. Getting an accurate model estimate is challenging for two main reasons. First, with the concern of confidentiality of individual private data, the original data are nearly unavailable; second, low quality data with outliers are inherently embedded in the collected data. In this paper, we propose an innovative scheme, PURE, which can accurately estimate the global regression model without the need for knowing local private data (referred to as blind regression modeling) even when there is a large portion of outliers embedded. The wisdom of PURE is to let individual participants peer judge and further improve the global estimate via negotiations. Meanwhile, during the whole process, all information is exchanged in an aggregated way. By design, PURE is secure and can well protect individual privacy. Furthermore, PURE is a lightweight protocol suitable for mobile devices. Extensive trace-driven simulation results show that PURE can achieve an outstanding accuracy gain of two orders of magnitude even with random outliers near a ratio of 50 percent compared with the state-of-the-art least square estimator.

[1]  Peter J. Rousseeuw,et al.  ROBUST REGRESSION BY MEANS OF S-ESTIMATORS , 1984 .

[2]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[3]  Fabrice Valois,et al.  Data aggregation in wireless sensor networks: Compressing or forecasting? , 2014, 2014 IEEE Wireless Communications and Networking Conference (WCNC).

[4]  Rui Zhang,et al.  PriSense: Privacy-Preserving Data Aggregation in People-Centric Urban Sensing Systems , 2010, 2010 Proceedings IEEE INFOCOM.

[5]  Tarek F. Abdelzaher,et al.  PoolView: stream privacy for grassroots participatory sensing , 2008, SenSys '08.

[6]  V. Yohai HIGH BREAKDOWN-POINT AND HIGH EFFICIENCY ROBUST ESTIMATES FOR REGRESSION , 1987 .

[7]  Qi Wang,et al.  Random-data perturbation techniques and privacy-preserving data mining , 2005, Knowledge and Information Systems.

[8]  K. Nahrstedt,et al.  iPDA: An integrity-protecting private data aggregation scheme for wireless sensor networks , 2008, MILCOM 2008 - 2008 IEEE Military Communications Conference.

[9]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[10]  Liusheng Huang,et al.  Privacy Preserving Outlier Detection over Vertically Partitioned Data , 2009, 2009 International Conference on E-Business and Information System Security.

[11]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[12]  Yang Xiao,et al.  Polynomial Regression Based Secure Data Aggregation for Wireless Sensor Networks , 2011, 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011.

[13]  Jiming Chen,et al.  On Exploiting Contact Patterns for Data Forwarding in Duty-Cycle Opportunistic Mobile Networks , 2013, IEEE Transactions on Vehicular Technology.

[14]  Jiming Chen,et al.  Toward optimal allocation of location dependent tasks in crowdsensing , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[15]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[16]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[17]  Qinghua Li,et al.  Providing Efficient Privacy-Aware Incentives for Mobile Sensing , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[18]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[19]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[20]  Miguel A. Labrador,et al.  P-Sense: A participatory sensing system for air pollution monitoring and control , 2011, 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[21]  Xi Chen,et al.  Mutual privacy-preserving regression modeling in participatory sensing , 2013, 2013 Proceedings IEEE INFOCOM.

[22]  Tarek F. Abdelzaher,et al.  GreenGPS: a participatory sensing fuel-efficient maps application , 2010, MobiSys '10.

[23]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[24]  Xiaodong Lin,et al.  Secure Regression on Distributed Databases , 2005 .

[25]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[26]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[27]  Suman Nath,et al.  Privacy-aware regression modeling of participatory sensing data , 2010, SenSys '10.

[28]  Landon P. Cox,et al.  LiveCompare: grocery bargain hunting through participatory sensing , 2009, HotMobile '09.

[29]  Ramakrishnan Srikant,et al.  Privacy preserving OLAP , 2005, SIGMOD '05.

[30]  Qinghua Li,et al.  Providing privacy-aware incentives for mobile sensing , 2013, 2013 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[31]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[32]  Jiming Chen,et al.  Detecting Faulty Nodes with Data Errors for Wireless Sensor Networks , 2014, ACM Trans. Sens. Networks.

[33]  Chris Clifton,et al.  Privacy-preserving outlier detection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[34]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[35]  Jiming Chen,et al.  Cross-Layer Optimization of Correlated Data Gathering in Wireless Sensor Networks , 2010, 2010 7th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON).

[36]  Emiliano Miluzzo,et al.  CenceMe - Injecting Sensing Presence into Social Networking Applications , 2007, EuroSSC.

[37]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.