Preventing Private Information Inference Attacks on Social Networks

On-line social networks, such as Facebook, are increasingly utilized by many people. These networks allow users to publish details about them-selves and connect to their friends. Some of the information revealed inside these networks is meant to be private. Yet it is possible that corporations could use learning algorithms on released data to predict undisclosed private information. In this paper, we explore how to launch inference at-tacks using released social networking data to predict undisclosed private information about individuals. We then devise three possible sanitization techniques that could be used in various situations. Then, we explore the eff ectiveness of these techniques by implementing them on a dataset obtained from the Dallas/Fort Worth, Texas network of the Facebook social networking application and attempting to use methods of collective inference to discover sensitive attributes of the data set. We show that we can decrease the eff ectiveness of both local and relational classification algorithms by using the sanitization methods we described. Further, we discover a problem domain where collective inference degrades the performance of classification algorithms for determining private attributes.

[1]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[2]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[3]  Siddharth Srivastava,et al.  Anonymizing Social Networks , 2007 .

[4]  Lise Getoor,et al.  Preserving the Privacy of Sensitive Relationships in Graph Data , 2007, PinKDD.

[5]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[6]  Zhenyu Liu,et al.  Inferring Privacy Information from Social Networks , 2006, ISI.

[7]  L. Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[8]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9]  Gerhard Lakemeyer,et al.  Exploring artificial intelligence in the new millennium , 2003 .

[10]  Alessandro Acquisti,et al.  Information revelation and privacy in online social networks , 2005, WPES '05.

[11]  K. Liu,et al.  Towards identity anonymization on graphs , 2008, SIGMOD Conference.

[12]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[13]  Lise Getoor,et al.  To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles , 2009, WWW '09.

[14]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[15]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[16]  Bhavani M. Thuraisingham,et al.  Inferring private information using social network data , 2009, WWW '09.