Achieving Differential Privacy Publishing of Location-Based Statistical Data Using Grid Clustering

Statistical partitioning and publishing is commonly used in location-based big data services to address queries such as the number of points of interest, available vehicles, traffic flows, infected patients, etc., within a certain range. Adding noise perturbation to the location-based statistical data according to the differential privacy model can reduce various risks caused by location privacy leakage while keeping the statistical characteristics of the published data. The traditional statistical partitioning and publishing methods realize the decomposition and indexing of 2D space from top to bottom. However, they can easily cause the over-partitioning or under-partitioning phenomenon, and therefore need multiple times of data scan. This paper proposes a grid clustering and differential privacy protection method for location-based statistical big data publishing scenarios. We implement location-based big data statistics in units of equal-sized grids and perform density classification on uniformly distributed grids by discrete wavelet transform. A bottom-up grid clustering algorithm is designed to perform on the blank and the uniform grids of the same density level based on neighborhood similarity. The Laplacian noise is incorporated into the clustering results according to the differential privacy model to form the published statistics. Experimental comparison of the real-world datasets manifests that the grid clustering and differential privacy publishing method proposed in this paper is superior to other existing partition publishing methods in terms of range querying accuracy and algorithm operating efficiency.

[1]  A. Mahmood,et al.  Privacy preserving dynamic data release against synonymous linkage based on microaggregation , 2021, Scientific Reports.

[2]  K. Gkritza,et al.  Individual and location-based characteristics associated with Autonomous Vehicle adoption in the Chicago metropolitan area: Implications for public health , 2021 .

[3]  Mireille Bossy,et al.  New spatial decomposition method for accurate, mesh-independent agglomeration predictions in particle-laden flows , 2021, Applied Mathematical Modelling.

[4]  Hiromichi Yamaguchi,et al.  Japanese travel behavior trends and change under COVID-19 state-of-emergency declaration: Nationwide observation by mobile phone location data , 2020, Transportation Research Interdisciplinary Perspectives.

[5]  Zhipeng Gui,et al.  Geospatial big data for urban planning and urban management , 2020, Geo spatial Inf. Sci..

[6]  Zhilin Liu,et al.  Revised DBSCAN Clustering Algorithm Based on Dual Grid , 2020, 2020 Chinese Control And Decision Conference (CCDC).

[7]  Zhu Xinyan,et al.  Public Epidemic Prevention and Control Services Based on Big Data of Spatiotemporal Location Make Cities more Smart , 2020 .

[8]  Ali Kamandi,et al.  SW-DBSCAN: A Grid-based DBSCAN Algorithm for Large Datasets , 2020, 2020 6th International Conference on Web Research (ICWR).

[9]  Y. Liao,et al.  COVID-19: Challenges to GIS with Big Data , 2020, Geography and Sustainability.

[10]  Guoqiang Zhou,et al.  Adaptive Grid Decomposition Algorithm based on Standard Deviation Circle Radius , 2019 .

[11]  Yingshuang Hu,et al.  基于MapReduce的强连通网格聚类算法 (Cell Clustering Algorithm Based on MapReduce and Strongly Connected Fusion) , 2019, 计算机科学.

[12]  Nils Gruschka,et al.  Privacy Issues and Data Protection in Big Data: A Case Study Analysis under GDPR , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[13]  Zhou Ding,et al.  Neighborhood density grid clustering and its applications , 2018 .

[14]  Dennis Goeckel,et al.  Privacy Against Statistical Matching: Inter-User Correlation , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[15]  Yan Yan,et al.  Hierarchical differential privacy hybrid decomposition algorithm for location big data , 2018, Cluster Computing.

[16]  Tianqing Zhu,et al.  Reward-based spatial crowdsourcing with differential privacy preservation , 2017, Enterp. Inf. Syst..

[17]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[18]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[19]  Ninghui Li,et al.  Differentially private grids for geospatial data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[20]  Divesh Srivastava,et al.  Differentially Private Spatial Decompositions , 2011, 2012 IEEE 28th International Conference on Data Engineering.

[21]  Wei Su,et al.  An Improved K-means Clustering Algorithm , 2014, J. Networks.

[22]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[23]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[24]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.