SNS Privacy Protection based on the ELM Integration and Semi-supervised Clustering

Social network service (SNS) is a new emerging Web application. With the growth of SNS in application, the web security is facing more serious threats and the leak of individual privacy is a major one. Because of vast number of unlabeled data and small amount of labeled data in SNS web, the availability of data is relatively poor for the study of SNS privacy preservation. In order to solve the above problem, this paper proposed an ELM ensemble algorithm based on Bagging combined with semi-supervised Seeds set clustering for privacy preserving. The main process is as follows: first, the ensemble ELM is used to label the unlabeled data to enlarge the scale of Seeds set; second, the Seeds set is used to initialize the center of clustering; and finally, the algorithm adopts semi-supervised clustering to achieve K-anonymity. Experimental results show that the method can improve the usability of the released data while preserving privacy.

[1]  Jia Lv,et al.  Semi-supervised Learning Using Local Regularizer and Unit Circle Class Label Representation , 2012, J. Softw..

[2]  LV Yue-jin,et al.  Heuristic algorithm for attribute reduction on concept lattice , 2009 .

[3]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[4]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[5]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[6]  Chao Zhang,et al.  Semi-supervised Kernel Clustering Algorithm Based on Seed Set , 2009, 2009 Asia-Pacific Conference on Information Processing.

[7]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[10]  D. DeWitt,et al.  K-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[12]  A. Solanas,et al.  A 2/sup d/-tree-based blocking method for microaggregating very large data sets , 2006, First International Conference on Availability, Reliability and Security (ARES'06).

[13]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[14]  Jian Yin,et al.  Data Dependant Learners Ensemble Pruning , 2012, J. Softw..

[15]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[16]  Anders Krogh,et al.  Learning with ensembles: How overfitting can be useful , 1995, NIPS.

[17]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[18]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[19]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[20]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[21]  Josep Domingo-Ferrer,et al.  Microaggregation for Database and Location Privacy , 2006, NGITS.

[22]  Sumitra Mukherjee,et al.  A Polynomial Algorithm for Optimal Univariate Microaggregation , 2003, IEEE Trans. Knowl. Data Eng..

[23]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[24]  Li Kun Some Developments on Semi-Supervised Clustering , 2009 .

[25]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[26]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[27]  Rafael A. Calvo,et al.  Accuracy and Diversity in Ensembles of Text Categorisers , 2005, CLEI Electron. J..

[28]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[29]  Tao Guo,et al.  Confidence Estimation for Graph-based Semi-supervised Learning , 2012, J. Softw..