Privacy Leakage through Innocent Content Sharing in Online Social Networks

The increased popularity and ubiquitous availability of online social networks and globalised Internet access have affected the way in which people share content. The information that users willingly disclose on these platforms can be used for various purposes, from building consumer models for advertising, to inferring personal, potentially invasive, information. In this work, we use Twitter, Instagram and Foursquare data to convey the idea that the content shared by users, especially when aggregated across platforms, can potentially disclose more information than was originally intended. We perform two case studies: First, we perform user de-anonymization by mimicking the scenario of finding the identity of a user making anonymous posts within a group of users. Empirical evaluation on a sample of real-world social network profiles suggests that cross-platform aggregation introduces significant performance gains in user identification. In the second task, we show that it is possible to infer physical location visits of a user on the basis of shared Twitter and Instagram content. We present an informativeness scoring function which estimates the relevance and novelty of a shared piece of information with respect to an inference task. This measure is validated using an active learning framework which chooses the most informative content at each given point in time. Based on a large-scale data sample, we show that by doing this, we can attain an improved inference performance. In some cases this performance exceeds even the use of the user's full timeline.

[1]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[2]  Wen Li,et al.  Want a coffee?: predicting users' trails , 2012, SIGIR '12.

[3]  H. Jeff Smith,et al.  Information Privacy: Measuring Individuals' Concerns About Organizational Practices , 1996, MIS Q..

[4]  Cynthia Dwork,et al.  Differential Privacy , 2006, Encyclopedia of Cryptography and Security.

[5]  Ravi Kumar,et al.  A characterization of online browsing behavior , 2010, WWW '10.

[6]  Lada A. Adamic,et al.  Computational Social Science , 2009, Science.

[7]  Claudio Carpineto,et al.  Semantic Search Log k-Anonymization with Generalized k-Cores of Query Concept Graph , 2013, ECIR.

[8]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[9]  Ryen W. White,et al.  Here and there: goals, activities, and predictions about location from geotagged queries , 2013, SIGIR.

[10]  Albert H. Segars,et al.  An Empirical Examination of the Concern for Information Privacy Instrument , 2002, Inf. Syst. Res..

[11]  Paul Benjamin Lowry,et al.  Information Disclosure on Mobile Devices: Re-Examining Privacy Calculus with Actual User Behavior , 2013, Int. J. Hum. Comput. Stud..

[12]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[13]  John Lafferty,et al.  Statistical machine learning for information retrieval , 2001 .

[14]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[15]  Michael D. Smith,et al.  Location, Location, Location: An Analysis of Profitability of Position in Online Advertising Markets , 2008 .

[16]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[17]  Naresh K. Malhotra,et al.  Internet Users' Information Privacy Concerns (IUIPC): The Construct, the Scale, and a Causal Model , 2004, Inf. Syst. Res..

[18]  Carsten Eickhoff,et al.  A Cross-Platform Collection of Social Network Profiles , 2016, SIGIR.

[19]  James Y. L. Thong,et al.  Internet Privacy Concerns: An Integrated Conceptualization and Four Empirical Studies , 2013, MIS Q..

[20]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..