A weighted K-member clustering algorithm for K-anonymization

As a representative model for privacy preserving data publishing, K-anonymity has raised a considerable number of questions for researchers over the past few decades. Among them, how to achieve data release without sacrificing the users’ privacy and how to maximize the availability of published data is the ultimate goal of privacy preserving data publishing. In order to enhance the clustering effect and reduce the unnecessary computation, this paper proposes a weighted K-member clustering algorithm. A series of weight indicators are designed to evaluate the outlyingness of records, distance between records, and information loss of the published data. The proposed algorithm can reduce the influence of outliers on the clustering effect and maintain the availability of data to the best possible extent during the clustering process. Experimental analysis suggests that the proposed method generates lower information loss, improves the clustering effect, and is less sensitive to outliers as compared with some existing methods.

[1]  Gustavo Malkomes,et al.  Fast Distributed k-Center Clustering with Outliers on Massive Data , 2015, NIPS.

[2]  Jin Li,et al.  Securely Outsourcing Attribute-Based Encryption with Checkability , 2014, IEEE Transactions on Parallel and Distributed Systems.

[3]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[4]  Jian Li,et al.  Epsilon-Coresets for Clustering (with Outliers) in Doubling Metrics , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[5]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[6]  Geppino Pucci,et al.  Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially , 2018, Proc. VLDB Endow..

[7]  Jun-Lin Lin,et al.  An efficient clustering method for k-anonymization , 2008, PAIS '08.

[8]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[9]  Quan Z. Sheng,et al.  Modelling the Publishing Process of Big Location Data Using Deep Learning Prediction Methods , 2020, Electronics.

[10]  Georgios C. Anagnostopoulos,et al.  A Scalable and Efficient Outlier Detection Strategy for Categorical Data , 2007 .

[11]  Sergei Vassilvitskii,et al.  Local Search Methods for k-Means with Outliers , 2017, Proc. VLDB Endow..

[12]  Tao Feng,et al.  Differential Private Spatial Decomposition and Location Publishing Based on Unbalanced Quadtree Partition Algorithm , 2020, IEEE Access.

[13]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[14]  Anuj Karpatne,et al.  Introduction to Data Mining (2nd Edition) , 2018 .

[15]  Jian Xu,et al.  Utility-based anonymization for privacy preservation with less information loss , 2006, SKDD.

[16]  Hong Shen,et al.  Privacy-preserving data publishing for multiple numerical sensitive attributes , 2015 .

[17]  Hang Li Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[18]  Fang Liu,et al.  A Clustering k-Anonymity Privacy-Preserving Method for Wearable IoT Devices , 2018, Secur. Commun. Networks.

[19]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[20]  Zhiqiang Xie,et al.  The privacy preserving method for dynamic trajectory releasing based on adaptive clustering , 2017, Inf. Sci..

[21]  Xiaohui Liang,et al.  Privacy Leakage of Location Sharing in Mobile Social Networks: Attacks and Defense , 2016, IEEE Transactions on Dependable and Secure Computing.

[22]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[23]  Jin Li,et al.  Insight of the protection for data security under selective opening attacks , 2017, Inf. Sci..

[24]  Shi Li,et al.  Distributed k-Clustering for Data with Heavy Noise , 2018, NeurIPS.

[25]  Yong Ma,et al.  K-Anonymity Algorithm Based on Improved Clustering , 2018, ICA3PP.

[26]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[27]  Devesh C. Jinwala,et al.  Novel Approaches for Privacy Preserving Data Mining in k-Anonymity Model , 2016, J. Inf. Sci. Eng..

[28]  Sudipto Guha,et al.  Distributed Partial Clustering , 2017, SPAA.

[29]  BALAJI PALANISAMY,et al.  Privacy-Preserving Publishing of Multilevel Utility-Controlled Graph Datasets , 2018, ACM Trans. Internet Techn..