Distributed Clustering in the Anonymized Space with Local Differential Privacy

Clustering and analyzing on collected data can improve user experiences and quality of services in big data, IoT applications. However, directly releasing original data brings potential privacy concerns, which raises challenges and opportunities for privacy-preserving clustering. In this paper, we study the problem of non-interactive clustering in distributed setting under the framework of local differential privacy. We first extend the Bit Vector, a novel anonymization mechanism to be functionality-capable and privacy-preserving. Based on the modified encoding mechanism, we propose kCluster algorithm that can be used for clustering in the anonymized space. We show the modified encoding mechanism can be easily implemented in existing clustering algorithms that only rely on distance information, such as DBSCAN. Theoretical analysis and experimental results validate the effectiveness of the proposed schemes.

[1]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Erhard Rahm,et al.  Privacy-Preserving Record Linkage for Big Data: Current Approaches and Research Challenges , 2017, Handbook of Big Data Technologies.

[4]  Ninghui Li,et al.  Locally Differentially Private Frequent Itemset Mining , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[5]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[6]  Ge Yu,et al.  Collecting and Analyzing Multidimensional Data with Local Differential Privacy , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[7]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[8]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[9]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[10]  Benjamin C. M. Fung,et al.  Secure Two-Party Differentially Private Data Release for Vertically Partitioned Data , 2014, IEEE Transactions on Dependable and Secure Computing.

[11]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Svetha Venkatesh,et al.  Privacy Aware K-Means Clustering with High Utility , 2016, PAKDD.

[13]  Yue Gao,et al.  Differentially private publication of general time-serial trajectory data , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[14]  Ninghui Li,et al.  Locally Differentially Private Protocols for Frequency Estimation , 2017, USENIX Security Symposium.

[15]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[16]  Aris Gkoulalas-Divanis,et al.  Distance-Aware Encoding of Numerical Values for Privacy-Preserving Record Linkage , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[17]  Zhengquan Xu,et al.  Cluster-Indistinguishability: A practical differential privacy mechanism for trajectory clustering , 2017, Intell. Data Anal..

[18]  Lin Sun,et al.  Randomized Bit Vector: Privacy-Preserving Encoding Mechanism , 2018, CIKM.

[19]  Elisa Bertino,et al.  Differentially Private K-Means Clustering , 2015, CODASPY.

[20]  Alessandro Rozza,et al.  A Novel Graph-Based Fisher Kernel Method for Semi-supervised Learning , 2014, 2014 22nd International Conference on Pattern Recognition.

[21]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[22]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[23]  Bolin Ding,et al.  Comparing Population Means under Local Differential Privacy: with Significance and Power , 2018, AAAI.

[24]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[25]  Junbin Gao,et al.  Robust latent low rank representation for subspace clustering , 2014, Neurocomputing.

[26]  Arya Mazumdar,et al.  Clustering Via Crowdsourcing , 2016, ArXiv.

[27]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[28]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[29]  Aris Gkoulalas-Divanis,et al.  FEDERAL: A Framework for Distance-Aware Privacy-Preserving Record Linkage , 2018, IEEE Transactions on Knowledge and Data Engineering.