Preventing Private Information Inference Attacks on Social Networks

Online social networks, such as Facebook, are increasingly utilized by many people. These networks allow users to publish details about themselves and to connect to their friends. Some of the information revealed inside these networks is meant to be private. Yet it is possible to use learning algorithms on released data to predict private information. In this paper, we explore how to launch inference attacks using released social networking data to predict private information. We then devise three possible sanitization techniques that could be used in various situations. Then, we explore the effectiveness of these techniques and attempt to use methods of collective inference to discover sensitive attributes of the data set. We show that we can decrease the effectiveness of both local and relational classification algorithms by using the sanitization methods we described.

[1]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[2]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[3]  K. Liu,et al.  Towards identity anonymization on graphs , 2008, SIGMOD Conference.

[4]  Chris Clifton,et al.  Using Sample Size to Limit Exposure to Data Mining , 2000, J. Comput. Secur..

[5]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[6]  L. Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[7]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[8]  Alessandro Acquisti,et al.  Information revelation and privacy in online social networks , 2005, WPES '05.

[9]  Keinosuke Fukunaga,et al.  Bayes Error Estimation Using Parzen and k-NN Procedures , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Charles Elkan,et al.  Predicting labels for dyadic data , 2010, Data Mining and Knowledge Discovery.

[12]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[13]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[14]  Siddharth Srivastava,et al.  Anonymizing Social Networks , 2007 .

[15]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[16]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[17]  Lise Getoor,et al.  Preserving the Privacy of Sensitive Relationships in Graph Data , 2007, PinKDD.

[18]  Zhenyu Liu,et al.  Inferring Privacy Information from Social Networks , 2006, ISI.

[19]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[20]  Lise Getoor,et al.  To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles , 2009, WWW '09.

[21]  Gerhard Lakemeyer,et al.  Exploring artificial intelligence in the new millennium , 2003 .

[22]  Kagan Tumer,et al.  Bayes Error Rate Estimation Using Classifier Ensembles , 2003 .

[23]  Ahmed K. Elmagarmid,et al.  Privometer: Privacy protection in social networks , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[24]  Bhavani M. Thuraisingham,et al.  Inferring private information using social network data , 2009, WWW '09.