Cross-Device Tracking of Employees with Social Networks

Companies around the world spent millions of dollars purchasing state-of-the-art network intrusion detection system to protect their network infrastructure from cyberattacks. Yet, they often neglect to protect employees’ personal mobile devices (PMDs). Adversaries could hack employees’ PMDs when they are outside the protection of the company’s firewalls and potentially exploit them as trojan horses for indirect attacks: such as slipping in malware while employees charge their PMDs using office devices (ODs), or exploiting PMDs’ bluetooth or NFC connection to infiltrate the company’s network via vulnerable intermediaries, or using PMDs’ cameras and microphones as eyes and ears in the company. Given the potential cybersecurity risks that compromised PMDs pose, it will then be important to protect them from being hacked by adversaries and the most straightforward method is to prevent adversaries from even identifying the employees’ PMDs. Su et al. (2017) suggest the possibility of using social media feeds to de-anonymise web browsing traffic, and in this report, we show that this concept can also be exploited to cross track employees’ ODs and their PMDs. Similar to (Su et al., 2017), our approach is based on the idea that each person has a distinctive social network, and thus the links appearing in one’s social media feeds are also unique. But on top of that, we further study how such behaviour evolves when it happens to a group of people belonging to the same social circle. We found that over time, due to the process of snowballing directed triadic closure, members of a social circle are likely to share similar followees and feeds which are unique to that social circle. This allows us to exploit the formula in Su et al. (2017) to 1) deduce employees’ social media accounts using web browsing traffic observed from the company, and 2) verify if an anonymous PMD belongs to an employee using web browsing traffic observed from that PMD. We implement and evaluate our strategy on synthetic web browsing traffic constructed from social circles that are derived from the Twitter social circles dataset from the Stanford Large Network Dataset Collection, and show that accurate deductions (i.e. TPR > 0.8) and verifications (i.e. TPR and TNR > 0.8) are achievable using our strategy. We also explore how the limitations faced by adversaries in reallife applications affect the performance of our strategy. Finally, we demonstrate how to counter our strategy using four different measures and assess their pros and cons.

[1]  Markus Jakobsson,et al.  Invasive browser sniffing and countermeasures , 2006, WWW '06.

[2]  Martin Vetterli,et al.  Where You Are Is Who You Are: User Identification by Matching Statistics , 2015, IEEE Transactions on Information Forensics and Security.

[3]  A. Rapoport Spread of information through a population with socio-structural bias: I. Assumption of transitivity , 1953 .

[4]  Eelco Herder,et al.  Web page revisitation revisited: implications of a long-term click-stream study of browser usage , 2007, CHI.

[5]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[6]  Li Li,et al.  eXtreme Gradient Boosting for Identifying Individual Users Across Different Digital Devices , 2016, WAIM.

[7]  Yi Tay,et al.  Cross Device Matching for Online Advertising with Neural Feature Ensembles : First Place Solution at CIKM Cup 2016 , 2016, ArXiv.

[8]  Vern Paxson,et al.  An Analysis of China's "Great Cannon" , 2015 .

[9]  Marius Kloft,et al.  Tracked Without a Trace: Linking Sessions of Users by Unsupervised Learning of Patterns in Their DNS Traffic , 2016, AISec@CCS.

[10]  Arvind Narayanan,et al.  De-anonymizing Web Browsing Data with Social Networks , 2017, WWW.

[11]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[12]  Jiwei Liu,et al.  Connecting Devices to Cookies via Filtering, Feature Engineering, and Boosting , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[13]  Hannes Federrath,et al.  Tracking Users on the Internet with Behavioral Patterns: Evaluation of Its Practical Feasibility , 2012, SEC.

[14]  Nam Khanh Tran Classification and Learning-to-rank Approaches for Cross-Device Matching at CIKM Cup 2016 , 2016, ArXiv.

[15]  Chunming Rong,et al.  Cross-Device Consumer Identification , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[16]  Xing Xie,et al.  Cross-Device User Matching Based on Massive Browse Logs: The Runner-Up Solution for the 2016 CIKM Cup , 2016, ArXiv.

[17]  Narseo Vallina-Rodriguez,et al.  Header Enrichment or ISP Enrichment?: Emerging Privacy Threats in Mobile Networks , 2015, HotMiddlebox@SIGCOMM.

[18]  David K. Y. Yau,et al.  Privacy vulnerability of published anonymous mobility traces , 2010, MobiCom.

[19]  Hannes Federrath,et al.  Behavior-based tracking: Exploiting characteristic patterns in DNS traffic , 2013, Comput. Secur..

[20]  Martín Casado,et al.  Peering Through the Shroud: The Effect of Edge Opacity on IP-Based Client Identification , 2007, NSDI.

[21]  Christo Wilson,et al.  Tracing Information Flows Between Ad Exchanges Using Retargeted Ads , 2018, USENIX Security Symposium.

[22]  Mark Landry,et al.  Multi-layer Classification: ICDM 2015 Drawbridge Cross-Device Connections Competition , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[23]  Edward W. Felten,et al.  Cookies That Give You Away: The Surveillance Implications of Web Tracking , 2015, WWW.

[24]  Arvind Narayanan,et al.  Online Tracking: A 1-million-site Measurement and Analysis , 2016, CCS.

[25]  Silvio Lattanzi,et al.  Linking Users Across Domains with Location Data: Theory and Validation , 2016, WWW.

[26]  Thakur Raj Anand,et al.  Machine Learning Approach to Identify Users Across Their Digital Devices , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[27]  Feida Zhu,et al.  A Comparison of Fundamental Network Formation Principles Between Offline and Online Friends on Twitter , 2016, NetSci-X.

[28]  Mirco Musolesi,et al.  Privacy and the City: User Identification and Location Semantics in Location-Based Social Networks , 2015, ICWSM.

[29]  Claude Castelluccia,et al.  On the uniqueness of Web browsing history patterns , 2014, Ann. des Télécommunications.

[30]  Roberto Díaz-Morales Cross-Device Tracking: Matching Devices and Cookies , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[31]  Aaron Alva,et al.  Cross-Device Tracking: Measurement and Disclosures , 2017, Proc. Priv. Enhancing Technol..

[32]  Mirco Musolesi,et al.  It's the way you check-in: identifying users in location-based social networks , 2014, COSN '14.

[33]  Ming Yang,et al.  A novel attack to track users based on the behavior patterns , 2017, Concurr. Comput. Pract. Exp..

[34]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[35]  Jon M. Kleinberg,et al.  The Directed Closure Process in Hybrid Social-Information Networks, with an Analysis of Link Formation on Twitter , 2010, ICWSM.

[36]  Dan Boneh,et al.  Protecting browser state from web privacy attacks , 2006, WWW '06.

[37]  Sebastian Zimmeck Using Machine Learning to improve Internet Privacy , 2017 .

[38]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[39]  Christopher Krügel,et al.  A Practical Attack to De-anonymize Social Network Users , 2010, 2010 IEEE Symposium on Security and Privacy.

[40]  Paul Johns,et al.  Exploring Cross-Device Web Use on PCs and Mobile Devices , 2009, INTERACT.

[41]  Saul Greenberg,et al.  How people revisit web pages: empirical findings and implications for the design of history systems , 1997, Int. J. Hum. Comput. Stud..

[42]  Mirco Musolesi,et al.  Spatio-temporal techniques for user identification by means of GPS mobility data , 2015, EPJ Data Science.

[43]  Yong Yu,et al.  Recovering Cross-Device Connections via Mining IP Footprints with Ensemble Learning , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[44]  Franco Zambonelli,et al.  Re-identification and information fusion between anonymized CDR and social network data , 2015, Journal of Ambient Intelligence and Humanized Computing.

[45]  Youngsoo Kim,et al.  Extending the Network: the Influence of Offline Friendship to Twitter Network , 2016, AMCIS.

[46]  Cheng Li,et al.  When a friend in Twitter is a friend in life , 2012, WebSci '12.

[47]  Ashraf Matrawy,et al.  A classification of web browser fingerprinting techniques , 2015, 2015 7th International Conference on New Technologies, Mobility and Security (NTMS).

[48]  Jeremy Walthers,et al.  Learning to Rank for Cross-Device Identification , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[49]  Tomasz Wiktorski,et al.  AFFM: Auto feature engineering in field-aware factorization machines for predictive analytics , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).