Leaking privacy and shadow profiles in online social networks

A data-driven audit shows how the data of the users of an online social network predict personal information of nonusers. Social interaction and data integration in the digital society can affect the control that individuals have on their privacy. Social networking sites can access data from other services, including user contact lists where nonusers are listed too. Although most research on online privacy has focused on inference of personal information of users, this data integration poses the question of whether it is possible to predict personal information of nonusers. This article tests the shadow profile hypothesis, which postulates that the data given by the users of an online service predict personal information of nonusers. Using data from a disappeared social networking site, we perform a historical audit to evaluate whether personal data of nonusers could have been predicted with the personal data and contact lists shared by the users of the site. We analyze personal information of sexual orientation and relationship status, which follow regular mixing patterns in the social network. Going back in time over the growth of the network, we measure predictor performance as a function of network size and tendency of users to disclose their contact lists. This article presents robust evidence supporting the shadow profile hypothesis and reveals a multiplicative effect of network size and disclosure tendencies that accelerates the performance of predictors. These results call for new privacy paradigms that take into account the fact that individual privacy decisions do not happen in isolation and are mediated by the decisions of others.

[1]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[2]  Alice Mattoni Book Review: The Logic of Connective Action: Digital Media and the Personalization of Contentious Politics , 2015 .

[3]  F. Hamprecht,et al.  One Plus One Makes Three (for Social Networks) , 2012, PloS one.

[4]  Seongho Kim ppcor: An R Package for a Fast Calculation to Semi-partial Correlation Coefficients. , 2015, Communications for statistical applications and methods.

[5]  Pavlin Mavrodiev,et al.  Social resilience in online communities: the autopsy of friendster , 2013, COSN '13.

[6]  Brayden G. King,et al.  The Logic of Connective Action: Digital Media and the Personalization of Contentious Politics. By W. Lance Bennett and Alexandra Segerberg. New York: Cambridge University Press, 2013. Pp. xiv+240. $29.99 (paper). , 2014 .

[7]  Frank Schweitzer,et al.  Online privacy as a collective phenomenon , 2014, COSN '14.

[8]  Jukka-Pekka Onnela,et al.  Spontaneous emergence of social influence in online systems , 2009, Proceedings of the National Academy of Sciences.

[9]  M. Castells The rise of the network society , 1996 .

[10]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[11]  Jon M. Kleinberg,et al.  Romantic partnerships and the dispersion of social ties: a network analysis of relationship status on facebook , 2013, CSCW.

[12]  Jon Kleinberg,et al.  Analysis of large-scale social and information networks , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[13]  Lise Getoor,et al.  To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles , 2009, WWW '09.

[14]  M. Kosinski,et al.  Computer-based personality judgments are more accurate than those made by humans , 2015, Proceedings of the National Academy of Sciences.

[15]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[16]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[17]  W. Bennett,et al.  Response to Sidney Tarrow’s review of The Logic of Connective Action: Digital Media and the Personalization of Contentious Politics , 2013, Perspectives on Politics.

[18]  Alice E. Marwick The Public Domain: Surveillance in Everyday Life , 2012 .

[19]  Behram F. T. Mistree,et al.  Gaydar: Facebook Friendships Expose Sexual Orientation , 2009, First Monday.

[20]  Krishna P. Gummadi,et al.  Analyzing facebook privacy settings: user expectations vs. reality , 2011, IMC '11.

[21]  Y. Poullet,et al.  The Right to Informational Self-Determination and the Value of Self-Development: Reassessing the Importance of Privacy for Democracy , 2009 .

[22]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.