Understanding Site-Based Inference Potential for Identifying Hidden Attributes

The popularity of social networking sites has led to the creation of massive online databases containing (potentially sensitive) personal information, portions of which are often publicly accessible. Although most popular social networking sites allow users to customize the degree to which their information is publicly exposed, the disclosure of even a small, seemingly innocuous set of profile attributes may be sufficient to infer a surprisingly revealing set of attribute-value pairings. This paper analyzes the predictive accuracy of existing and ensemble inference algorithms to infer hidden attributes using publicly exposed attribute-values. For our tested population, we find that (i) certain attributes are more accurately predicted than others, (ii) each tested inference algorithm is well-suited for inferring a particular subset of attributes, and (iii) these subsets of inferable attributes often have little overlap. Taken collectively, our results indicate that the amount of information one can extract from a given user's public profile is often greater than the sum of the attributes that the user has chosen to publish.

[1]  John Ferro,et al.  Identifying individual vulnerability based on public data , 2013, 2013 Eleventh Annual Conference on Privacy, Security and Trust.

[2]  Grace Hui Yang,et al.  Increasing Stability of Result Organization for Session Search , 2013, ECIR.

[3]  Lisa Singh,et al.  Exploring re-identification risks in public domains , 2012, 2012 Tenth Annual International Conference on Privacy, Security and Trust.

[4]  Bobby Bhattacharjee,et al.  Persona: an online social network with user-defined privacy , 2009, SIGCOMM '09.

[5]  Dan Cosley,et al.  Inferring social ties from geographic coincidences , 2010, Proceedings of the National Academy of Sciences.

[6]  Alessandro Acquisti,et al.  Information revelation and privacy in online social networks , 2005, WPES '05.

[7]  Mohamed Ali Kâafar,et al.  You are what you like! Information leakage through users' Interests , 2012, NDSS.

[8]  Lise Getoor,et al.  To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles , 2009, WWW '09.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[11]  David S. Rosenblum,et al.  What Anyone Can Know: The Privacy Risks of Social Networking Sites , 2007, IEEE Security & Privacy.

[12]  Evimaria Terzi,et al.  A Framework for Computing the Privacy Scores of Users in Online Social Networks , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[13]  A. Lenhart,et al.  Teens, privacy and online social networks: How teens manage their online identities and personal information in the age of MySpace , 2007 .

[14]  Bhavani M. Thuraisingham,et al.  Inferring private information using social network data , 2009, WWW '09.

[15]  Keith W. Ross,et al.  Facebook users have become much more private: A large-scale study , 2012, 2012 IEEE International Conference on Pervasive Computing and Communications Workshops.

[16]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[17]  Krishna P. Gummadi,et al.  You are who you know: inferring user profiles in online social networks , 2010, WSDM '10.

[18]  Sonia Livingstone,et al.  Taking risky opportunities in youthful content creation: teenagers' use of social networking sites for intimacy, privacy and self-expression , 2008, New Media Soc..

[19]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[20]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.