Behavior-based tracking of Internet users with semi-supervised learning

Behavior-based tracking is an unobtrusive technique that allows observers on the Internet to monitor user activities over long periods of time - in spite of changing IP addresses. Our technique uses semi-supervised machine learning, which allows observers to track users without the need for multiple labeled training sessions. We present evaluation results obtained on a realistic dataset that contains the DNS traffic of 3,800 users. Given the traffic of one week, our simulated observers can link the sessions of up to 87% of the users with surprisingly little effort. Our results indicate that observers can leverage unlabeled sessions to increase the robustness of existing tracking techniques. This makes it more difficult for users to protect their privacy on the Internet.

[1]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[2]  Fang Yu,et al.  How dynamic are IP addresses? , 2007, SIGCOMM '07.

[3]  Marius Kloft,et al.  Security analysis of online centroid anomaly detection , 2010, J. Mach. Learn. Res..

[4]  Anja Feldmann,et al.  On dominant characteristics of residential broadband internet traffic , 2009, IMC '09.

[5]  Josep M. Pujol,et al.  Tracking the Trackers , 2016, WWW.

[6]  Hannes Federrath,et al.  Behavior-based tracking: Exploiting characteristic patterns in DNS traffic , 2013, Comput. Secur..

[7]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[8]  Vashek Matyas,et al.  User Profiling and Re-identification: Case of University-Wide Network Analysis , 2009, TrustBus.

[9]  Hannes Federrath,et al.  Fingerprinting Techniques for Target-oriented Investigations in Network Forensics , 2014, Sicherheit.

[10]  Gunnar Rätsch,et al.  Probabilistic clustering of time-evolving distance data , 2015, Machine Learning.

[11]  Yinglian Xie,et al.  How dynamic are IP addresses , 2007, SIGCOMM 2007.

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Hannes Federrath,et al.  Evaluating the Security of a DNS Query Obfuscation Scheme for Private Web Surfing , 2014, SEC.

[14]  Yinghui Yang,et al.  Web user behavioral profiling for user identification , 2010, Decis. Support Syst..

[15]  Marius Kloft,et al.  Learning Kernels Using Local Rademacher Complexity , 2013, NIPS.

[16]  Marius Kloft,et al.  Tracked Without a Trace: Linking Sessions of Users by Unsupervised Learning of Patterns in Their DNS Traffic , 2016, AISec@CCS.

[17]  Junjie Zhang,et al.  You Are How You Query: Deriving Behavioral Fingerprints from DNS Traffic , 2015, SecureComm.

[18]  Hannes Federrath,et al.  EncDNS: A Lightweight Privacy-Preserving Name Resolution Service , 2014, ESORICS.

[19]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[20]  Hannes Federrath,et al.  IPv6 Prefix Alteration: An Opportunity to Improve Online Privacy , 2012, ArXiv.