Differential privacy for publishing enterprise-scale WLAN traces

Wireless trace data play an important role in wireless network researches. However, publishing the raw WLAN traces poses potential privacy risks of network users. Therefore, it is necessary to sanitize users’ sensitive information before these traces are published, and provide high data utility for wireless network researches as well. Although some existing works based on various anonymization methods have started to address the problem of sanitizing WLAN traces, we find the anonymization techniques cannot provide strong and provable privacy guarantee by analyzing a real WLAN trace dataset. Differential Privacy is the only framework that can provide strong and provable privacy guarantee. However, our analysis shows that existing studies on differential privacy fail to provide effective data utility for query operations on multi-dimensional and large-scale datasets. Aiming at WLAN trace datasets that have unique characteristics of multi-dimensional and large-scale, this paper proposes a privacy-preserving data publishing algorithm which not only satisfies differential privacy but also realizes high data utility for query operations. We prove that the proposed sanitization algorithm satisfies $$\epsilon $$ϵ-differential privacy. Furthermore, the theoretical analysis shows the noise variance in our sanitization algorithm is $$O(\log ^{o(1)}n/\epsilon ^2)$$O(logo(1)n/ϵ2) which indicates the algorithm can achieve high data utility on large-scale datasets. Moreover, from the results of extensive experiments on an enterprise-scale WLAN trace dataset, we also show that our sanitization algorithm can provide high data utility for query operations.

[1]  Nick Roussopoulos,et al.  Cubetree: organization of and bulk incremental updates on the data cube , 1997, SIGMOD '97.

[2]  Jong Wook Kim,et al.  Application of Local Differential Privacy to Collection of Indoor Positioning Data , 2018, IEEE Access.

[3]  Benjamin C. M. Fung,et al.  Publishing set-valued data via differential privacy , 2011, Proc. VLDB Endow..

[4]  Philip S. Yu,et al.  Differentially Private Data Publishing and Analysis: A Survey , 2017, IEEE Transactions on Knowledge and Data Engineering.

[5]  Ratul Mahajan,et al.  Differentially-private network trace analysis , 2010, SIGCOMM '10.

[6]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[7]  Johannes Gehrke,et al.  iReduct: differential privacy with reduced relative errors , 2011, SIGMOD '11.

[8]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[9]  Yin Yang,et al.  Differentially private histogram publication , 2012, The VLDB Journal.

[10]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[11]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[12]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[13]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[14]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[15]  Ahmed Helmy,et al.  Structural Analysis of User Association Patterns in University Campus Wireless LANs , 2012, IEEE Transactions on Mobile Computing.

[16]  Ning Zhang,et al.  PCP: A Privacy-Preserving Content-Based Publish–Subscribe Scheme With Differential Privacy in Fog Computing , 2017, IEEE Access.

[17]  Akihiko Ohsuga,et al.  Differential Private Data Collection and Analysis Based on Randomized Multiple Dummies for Untrusted Mobile Crowdsensing , 2017, IEEE Transactions on Information Forensics and Security.

[18]  Cyrus Shahabi,et al.  Differentially Private Location Protection for Worker Datasets in Spatial Crowdsourcing , 2017, IEEE Transactions on Mobile Computing.

[19]  Hans-Peter Kriegel,et al.  The DC-tree: a fully dynamic index structure for data warehouses , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[20]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[21]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[22]  Rongxing Lu,et al.  A New Differentially Private Data Aggregation With Fault Tolerance for Smart Grid Communications , 2015, IEEE Internet of Things Journal.

[23]  Pramod Viswanath,et al.  The Composition Theorem for Differential Privacy , 2013, IEEE Transactions on Information Theory.

[24]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Célio Vinicius N. de Albuquerque,et al.  NECTAR: a DTN routing protocol based on neighborhood contact history , 2009, SAC '09.

[26]  Sheng Zhong,et al.  A Jointly Differentially Private Scheduling Protocol for Ridesharing Services , 2017, IEEE Transactions on Information Forensics and Security.

[27]  Geir E. Dullerud,et al.  Differential Privacy in Linear Distributed Control Systems: Entropy Minimizing Mechanisms and Performance Tradeoffs , 2017, IEEE Transactions on Control of Network Systems.

[28]  Nick Koudas,et al.  The design of a query monitoring system , 2009, TODS.

[29]  Charles V. Wright,et al.  Playing Devil's Advocate: Inferring Sensitive Information from Anonymized Network Traces , 2007, NDSS.

[30]  Guanhua Yan,et al.  Privacy analysis of user association logs in a large-scale wireless LAN , 2011, 2011 Proceedings IEEE INFOCOM.

[31]  A. Terzis,et al.  On the detection and origin identification of mobile worms , 2007, WORM '07.

[32]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[33]  Claude Castelluccia,et al.  Differentially private sequential data publication via variable-length n-grams , 2012, CCS.

[34]  Ahmed Helmy,et al.  Human Behavior and Challenges of Anonymizing WLAN Traces , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[35]  Chi-Yin Chow,et al.  Trajectory privacy in location-based services and data publication , 2011, SKDD.

[36]  Pramod Viswanath,et al.  Optimal Noise Adding Mechanisms for Approximate Differential Privacy , 2016, IEEE Transactions on Information Theory.

[37]  Raymond Chi-Wing Wong,et al.  Anonymization-based attacks in privacy-preserving data publishing , 2009, TODS.

[38]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[39]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.