Matching user accounts across online social networks : methods and applications. (Corrélation des profils d'utilisateurs dans les réseaux sociaux : méthodes et applications)

The proliferation of social networks and all the personal data that people share brings many opportunities for developing exciting new applications. At the same time, however, the availability of vast amounts of personal data raises privacy and security concerns.In this thesis, we develop methods to identify the social networks accounts of a given user. We first study how we can exploit the public profiles users maintain in different social networks to match their accounts. We identify four important properties – Availability, Consistency, non- Impersonability, and Discriminability (ACID) – to evaluate the quality of different profile attributes to match accounts. Exploiting public profiles has a good potential to match accounts because a large number of users have the same names and other personal infor- mation across different social networks. Yet, it remains challenging to achieve practically useful accuracy of matching due to the scale of real social networks. To demonstrate that matching accounts in real social networks is feasible and reliable enough to be used in practice, we focus on designing matching schemes that achieve low error rates even when applied in large-scale networks with hundreds of millions of users. Then, we show that we can still match accounts across social networks even if we only exploit what users post, i.e., their activity on a social networks. This demonstrates that, even if users are privacy conscious and maintain distinct profiles on different social networks, we can still potentially match their accounts. Finally, we show that, by identifying accounts that correspond to the same person inside a social network, we can detect impersonators.

[1]  Peter Christen,et al.  A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication , 2012, IEEE Transactions on Knowledge and Data Engineering.

[2]  Krishna P. Gummadi,et al.  You are who you know: inferring user profiles in online social networks , 2010, WSDM '10.

[3]  Krishna P. Gummadi,et al.  Deep Twitter diving: exploring topical groups in microblogs at scale , 2014, CSCW.

[4]  Balachander Krishnamurthy,et al.  On the leakage of personally identifiable information via online social networks , 2010, Comput. Commun. Rev..

[5]  Fengjun Li,et al.  New Privacy Threats in Healthcare Informatics : When Medical Records Join the Web , 2010 .

[6]  A. Acquisti,et al.  Privacy in the Age of Augmented Reality , 2011 .

[7]  Michael L. Nelson,et al.  An Unsupervised Approach to Discovering and Disambiguating Social Media Profiles , 2011 .

[8]  Marco Gruteser,et al.  USENIX Association , 1992 .

[9]  Stephen E. Fienberg Toward a Reconceptualization of Confidentiality Protection in the Context of Linkages with Administrative Records , 2011, J. Priv. Confidentiality.

[10]  Martín Abadi,et al.  Host Fingerprinting and Tracking on the Web: Privacy and Security Implications , 2012, NDSS.

[11]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[12]  George Varghese,et al.  I seek you: searching and matching individuals in social networks , 2009, WIDM.

[13]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[14]  Fan Zhang,et al.  What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[15]  Virgílio A. F. Almeida,et al.  Studying User Footprints in Different Online Social Networks , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[16]  Vincent Y. Shen,et al.  User identification across multiple social networks , 2009, 2009 First International Conference on Networked Digital Technologies.

[17]  Mohamed Ali Kâafar,et al.  You are what you like! Information leakage through users' Interests , 2012, NDSS.

[18]  Markus Jakobsson,et al.  Messin' with Texas Deriving Mother's Maiden Names Using Public Records , 2005, ACNS.

[19]  R RamPrakash.,et al.  Protecting Privacy Against Location-Based Personal Identification , 2015 .

[20]  Matthew Michelson,et al.  Tweet Disambiguate Entities Retrieve Folksonomy SubTree Step 1 : Discover Categories Generate Topic Profile from SubTrees Step 2 : Discover Profile Topic Profile : “ English Football ” “ World Cup ” , 2010 .

[21]  Alessandro Acquisti,et al.  Information revelation and privacy in online social networks , 2005, WPES '05.

[22]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[23]  Michael Sirivianos,et al.  Aiding the Detection of Fake Accounts in Large Scale Social Online Services , 2012, NSDI.

[24]  Krishna P. Gummadi,et al.  An analysis of social network-based Sybil defenses , 2010, SIGCOMM '10.

[25]  Srinivasan Seshan,et al.  Improving wireless privacy with an identifier-free link layer protocol , 2008, MobiSys '08.

[26]  Anthony D. Miyazaki Online Privacy and the Disclosure of Cookie Use: Effects on Consumer Trust and Anticipated Patronage , 2008 .

[27]  Markus Jakobsson,et al.  Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Theft , 2006 .

[28]  Balachander Krishnamurthy,et al.  Measuring privacy loss and the impact of privacy protection in web browsing , 2007, SOUPS '07.

[29]  K. A. Taipale Data Mining and Domestic Security: Connecting the Dots to Make Sense of Data , 2004 .

[30]  Erhard Rahm,et al.  Frameworks for entity matching: A comparison , 2010, Data Knowl. Eng..

[31]  Leyla Bilge,et al.  All your contacts are belong to us: automated identity theft attacks on social networks , 2009, WWW '09.

[32]  Richard Chbeir,et al.  User Profile Matching in Social Networks , 2010, 2010 13th International Conference on Network-Based Information Systems.

[33]  C.T.A.M. de Laat,et al.  A study on the re-identifiability of Dutch citizens , 2010 .

[34]  User Privacy and the Evolution of Third-Party Tracking Mechanisms on the World Wide Web , 2010 .

[35]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[36]  Balachander Krishnamurthy,et al.  Characterizing privacy in online social networks , 2008, WOSN '08.

[37]  Seung-won Hwang,et al.  SocialSearch: enhancing entity search with social network matching , 2011, EDBT/ICDT '11.

[38]  Kirstie Hawkey,et al.  A billion keys, but few locks: the crisis of web single sign-on , 2010, NSPW '10.

[39]  George Danezis,et al.  GENERAL TERMS , 2003 .

[40]  Dan Boneh,et al.  Location Privacy via Private Proximity Testing , 2011, NDSS.

[41]  Benjamin Livshits,et al.  RePriv: Re-imagining Content Personalization and In-browser Privacy , 2011, 2011 IEEE Symposium on Security and Privacy.

[42]  David K. Y. Yau,et al.  Privacy vulnerability of published anonymous mobility traces , 2010, MobiCom.

[43]  Kyriakos Mouratidis,et al.  Preventing Location-Based Identity Inference in Anonymous Spatial Queries , 2007, IEEE Transactions on Knowledge and Data Engineering.

[44]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[45]  Jessica Staddon Finding "hidden" connections on linkedIn an argument for more pragmatic social network privacy , 2009, AISec '09.

[46]  Reza Shokri,et al.  Evaluating the Privacy Risk of Location-Based Services , 2011, Financial Cryptography.

[47]  Ed H. Chi,et al.  Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles , 2011, CHI.

[48]  Claude Castelluccia,et al.  When Privacy meets Security: Leveraging personal information for password cracking , 2013, ArXiv.

[49]  B. Krishnamurthy,et al.  How Much Is Too Much? Privacy Issues on Twitter , 2010 .

[50]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[51]  Philippe Golle,et al.  Revisiting the uniqueness of simple demographics in the US population , 2006, WPES '06.

[52]  Christian S. Jensen,et al.  Location-Related Privacy in Geo-Social Networks , 2011, IEEE Internet Computing.

[53]  Krishna P. Gummadi,et al.  Exploring the design space of social network-based Sybil defenses , 2012, 2012 Fourth International Conference on Communication Systems and Networks (COMSNETS 2012).

[54]  Sunny Consolvo,et al.  The Wi-Fi privacy ticker: improving awareness & control of personal information exposure on Wi-Fi , 2010, UbiComp.

[55]  Christopher Krügel,et al.  A Practical Attack to De-anonymize Social Network Users , 2010, 2010 IEEE Symposium on Security and Privacy.

[56]  Benjamin Picart,et al.  Improved Phone Posterior Estimation Through k-NN and MLP-Based Similarity , 2009 .

[57]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[58]  Seungyeop Han,et al.  Analysis of topological characteristics of huge online social networking services , 2007, WWW '07.

[59]  Hua Lu,et al.  Location Privacy Techniques in Client-Server Architectures , 2009, Privacy in Location-Based Applications.

[60]  Marco Conti,et al.  Dynamics of personal social relationships in online social networks: a study on twitter , 2013, COSN '13.

[61]  Krishna P. Gummadi,et al.  Inferring who-is-who in the Twitter social network , 2012, WOSN '12.

[62]  Balachander Krishnamurthy,et al.  Generating a privacy footprint on the internet , 2006, IMC '06.

[63]  Anupam Joshi,et al.  @i seek 'fb.me': identifying users across multiple online social networks , 2013, WWW.

[64]  John Krumm,et al.  A survey of computational location privacy , 2009, Personal and Ubiquitous Computing.

[65]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[66]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[67]  Gerald Friedland,et al.  Cybercasing the Joint: On the Privacy Implications of Geo-Tagging , 2010, HotSec.

[68]  Cliff Lampe,et al.  A familiar face(book): profile elements as signals in an online social network , 2007, CHI.

[69]  David M. Nicol,et al.  unFriendly: Multi-party Privacy Risks in Social Networks , 2010, Privacy Enhancing Technologies.

[70]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[71]  Philippe Golle,et al.  On the Anonymity of Home/Work Location Pairs , 2009, Pervasive.

[72]  Peter Christen,et al.  Data Matching , 2012, Data-Centric Systems and Applications.

[73]  Sushil Jajodia,et al.  Protecting Privacy Against Location-Based Personal Identification , 2005, Secure Data Management.

[74]  David Wetherall,et al.  Detecting and Defending Against Third-Party Tracking on the Web , 2012, NSDI.

[75]  Peter Fankhauser,et al.  Identifying Users Across Social Tagging Systems , 2011, ICWSM.

[76]  Tristan Henderson,et al.  CRAWDAD dataset dartmouth/campus (v.2004-12-18) , 2004 .

[77]  Roksana Boreli,et al.  Is more always merrier?: a deep dive into online social footprints , 2012, WOSN '12.

[78]  Helen Nissenbaum,et al.  Adnostic: Privacy Preserving Targeted Advertising , 2010, NDSS.

[79]  Erdong Chen,et al.  Facebook immune system , 2011, SNS '11.

[80]  Giancarlo Ruffo,et al.  LotusNet: Tunable privacy for distributed online social network services , 2012, Comput. Commun..

[81]  Reza Zafarani,et al.  Connecting Corresponding Identities across Communities , 2009, ICWSM.

[82]  Andrew Simpson,et al.  On Privacy and Public Data: a Study of data.gov.uk , 2011, J. Priv. Confidentiality.

[83]  Dan Boneh,et al.  Protecting browser state from web privacy attacks , 2006, WWW '06.

[84]  Michael Hicks,et al.  Deanonymizing mobility traces: using social network as a side-channel , 2012, CCS.

[85]  Panos Kalnis,et al.  Private queries in location based services: anonymizers are not necessary , 2008, SIGMOD Conference.

[86]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[87]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[88]  John C. Mitchell,et al.  Third-Party Web Tracking: Policy and Technology , 2012, 2012 IEEE Symposium on Security and Privacy.

[89]  Andreas Gampe,et al.  The privacy in the time of the internet: secrecy vs transparency , 2012, CODASPY '12.

[90]  Hassan Takabi,et al.  Towards active detection of identity clone attacks on online social networks , 2011, CODASPY '11.

[91]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[92]  Young-In Song,et al.  Competition-based user expertise score estimation , 2011, SIGIR.

[93]  Gene Tsudik,et al.  Exploring Linkability of User Reviews , 2012, ESORICS.

[94]  Krishna P. Gummadi,et al.  Analyzing facebook privacy settings: user expectations vs. reality , 2011, IMC '11.

[95]  Matthew D. Lieberman,et al.  Birds of a feather , 1994, Nature Structural Biology.

[96]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[97]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[98]  Michael Kaminsky,et al.  SybilGuard: Defending Against Sybil Attacks via Social Networks , 2008, IEEE/ACM Transactions on Networking.

[99]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[100]  John Krumm,et al.  Inference Attacks on Location Tracks , 2007, Pervasive.

[101]  Gerald Friedland,et al.  Sherlock holmes' evil twin: on the impact of global inference for online privacy , 2011, NSPW '11.

[102]  Reza Zafarani,et al.  Connecting users across social media sites: a behavioral-modeling approach , 2013, KDD.

[103]  Adrian Popescu,et al.  Mining User Home Location and Gender from Flickr Tags , 2010, ICWSM.

[104]  Aziz Mohaisen,et al.  Privacy in Location Based Services: Primitives Toward the Solution , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.

[105]  Lise Getoor,et al.  To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles , 2009, WWW '09.

[106]  Todd Wareham,et al.  Privacy Advisors for Personal Information Management , 2006 .

[107]  Calton Pu,et al.  Modeling Unintended Personal-Information Leakage from Multiple Online Social Networks , 2011, IEEE Internet Computing.

[108]  Vern Paxson,et al.  Detecting and Analyzing Automated Activity on Twitter , 2011, PAM.

[109]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[110]  L Sweeney,et al.  Weaving Technology and Policy Together to Maintain Confidentiality , 1997, Journal of Law, Medicine & Ethics.

[111]  Gang Wang,et al.  Social Turing Tests: Crowdsourcing Sybil Detection , 2012, NDSS.

[112]  Calton Pu,et al.  Large Online Social Footprints--An Emerging Threat , 2009, 2009 International Conference on Computational Science and Engineering.

[113]  Sushil Jajodia,et al.  The inference problem: a survey , 2002, SKDD.

[114]  Hung-Min Sun,et al.  A defence scheme against Identity Theft Attack based on multiple social networks , 2014, Expert Syst. Appl..

[115]  Balachander Krishnamurthy,et al.  WWW 2009 MADRID! Track: Security and Privacy / Session: Web Privacy Privacy Diffusion on the Web: A Longitudinal Perspective , 2022 .

[116]  Jimmy J. Lin,et al.  You Are Where You Edit: Locating Wikipedia Contributors through Edit Histories , 2009, ICWSM.

[117]  Hui Zang,et al.  Anonymization of location data does not work: a large-scale measurement study , 2011, MobiCom.

[118]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[119]  Dawn Xiaodong Song,et al.  On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.

[120]  Balachander Krishnamurthy,et al.  Privacy Leakage in Mobile Online Social Networks , 2010, WOSN.

[121]  Seungyeop Han,et al.  Privacy Revelations for Web and Mobile Apps , 2011, HotOS.

[122]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[123]  Daniel Gayo Avello All liaisons are dangerous when all your friends are known to us , 2011, Hypertext 2011.

[124]  Roksana Boreli,et al.  I know who you will meet this evening! Linking wireless devices using Wi-Fi probe requests , 2012, 2012 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM).

[125]  Vitaly Shmatikov,et al.  2011 IEEE Symposium on Security and Privacy “You Might Also Like:” Privacy Risks of Collaborative Filtering , 2022 .

[126]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[127]  J. E. King Binary Logistic Regression , 2008 .

[128]  Zhenyu Liu,et al.  Inferring Privacy Information from Social Networks , 2006, ISI.

[129]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[130]  Sonja Buchegger,et al.  PeerSoN: P2P social networking: early experiences and insights , 2009, SNS '09.

[131]  Saikat Guha,et al.  Challenges in measuring online advertising systems , 2010, IMC '10.

[132]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[133]  Refik Molva,et al.  Safebook: A privacy-preserving online social network leveraging on real-life trust , 2009, IEEE Communications Magazine.

[134]  Reza Zafarani,et al.  Understanding User Migration Patterns in Social Media , 2011, AAAI.

[135]  Behram F. T. Mistree,et al.  Gaydar: Facebook Friendships Expose Sexual Orientation , 2009, First Monday.

[136]  Lior Rokach,et al.  Entity Matching in Online Social Networks , 2013, 2013 International Conference on Social Computing.

[137]  Andrew Warfield,et al.  Herbert West - Deanonymizer , 2011, HotSec.

[138]  Ling Huang,et al.  Predicting Links and Inferring Attributes using a Social-Attribute Network (SAN) , 2011, ArXiv.

[139]  Bartunov Sergey,et al.  Joint Link-Attribute User Identity Resolution in Online Social Networks , 2012 .

[140]  Fuchun Peng,et al.  N-GRAM-BASED AUTHOR PROFILES FOR AUTHORSHIP ATTRIBUTION , 2003 .

[141]  Zachary Weinberg,et al.  I Still Know What You Visited Last Summer: Leaking Browsing History via User Interaction and Side Channel Attacks , 2011, 2011 IEEE Symposium on Security and Privacy.

[142]  Saikat Guha,et al.  Privad: Practical Privacy in Online Advertising , 2011, NSDI.

[143]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[144]  Julien Freudiger,et al.  Private Sharing of User Location over Online Social Networks , 2010 .

[145]  Lars Backstrom,et al.  Find me if you can: improving geographical prediction with social and spatial proximity , 2010, WWW '10.

[146]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[147]  Virgílio A. F. Almeida,et al.  Characterizing user behavior in online social networks , 2009, IMC '09.

[148]  S. Mohapatra,et al.  Binary Logistic Regression , 2014 .

[149]  Bobby Bhattacharjee,et al.  Persona: an online social network with user-defined privacy , 2009, SIGCOMM '09.

[150]  Christopher Krügel,et al.  Abusing Social Networks for Automated User Profiling , 2010, RAID.

[151]  Sushil Jajodia,et al.  Privacy in geo-social networks: proximity notification with untrusted service providers and curious buddies , 2010, The VLDB Journal.

[152]  Drummond Reed,et al.  OpenID 2.0: a platform for user-centric identity management , 2006, DIM '06.

[153]  G. McNair,et al.  Identity Theft , 2007, The SAGE Encyclopedia of Criminal Psychology.

[154]  David Wetherall,et al.  Privacy oracle: a system for finding application leaks with black box differential testing , 2008, CCS.

[155]  George Danezis,et al.  Quantifying Location Privacy: The Case of Sporadic Location Exposure , 2011, PETS.

[156]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[157]  Sotiris Ioannidis,et al.  Detecting social network profile cloning , 2011, 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[158]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[159]  Jim Harper,et al.  Effective Counterterrorism and the Limited Role of Predictive Data Mining , 2006 .

[160]  Jean-Yves Le Boudec,et al.  Quantifying Location Privacy , 2011, 2011 IEEE Symposium on Security and Privacy.

[161]  Ling Huang,et al.  Evolution of social-attribute networks: measurements, modeling, and implications using google+ , 2012, Internet Measurement Conference.

[162]  Tadayoshi Kohno,et al.  Devices That Tell on You: Privacy Trends in Consumer Ubiquitous Computing , 2007, USENIX Security Symposium.

[163]  Claudio Soriente,et al.  Hummingbird: Privacy at the Time of Twitter , 2012, 2012 IEEE Symposium on Security and Privacy.

[164]  Gerald W. Gates How Uncertainty about Privacy and Confidentiality is Hampering Efforts to More Effectively Use Administrative Records in Producing U.S. National Statistics , 2011, J. Priv. Confidentiality.

[165]  Marco Rosa,et al.  Four degrees of separation , 2011, WebSci '12.

[166]  Bhavani M. Thuraisingham,et al.  Inferring private information using social network data , 2009, WWW '09.

[167]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[168]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[169]  Silvio Lattanzi,et al.  An efficient reconciliation algorithm for social networks , 2013, Proc. VLDB Endow..

[170]  M. Newman Communities, modules and large-scale structure in networks , 2011, Nature Physics.

[171]  Sheila Kinsella,et al.  "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.