DNS weighted footprints for web browsing analytics

Abstract The monetization of the large amount of data that ISPs have of their users is still in early stages. Specifically, the knowledge of the websites that specific users or aggregates of users visit opens new opportunities of business, after the convenient sanitization. However, the construction of accurate DNS-based web-user profiles on large networks is a challenge not only because the requirements that capturing traffic entails, but also given the use of DNS caches, the proliferation of botnets and the complexity of current websites (i.e., when a user visit a website a set of self-triggered DNS queries for banners, from both same company and third parties services, as well for some preloaded and prefetching contents are in place). In this way, we propose to count the intentional visits users make to websites by means of DNS weighted footprints. Such novel approach consists of considering that a website was actively visited if an empirical-estimated fraction of the DNS queries of both the own website and the set of self-triggered websites are found. This approach has been coded in a final system named DNS prints . After its parameterization (i.e., balancing the importance of a website in a footprint with respect to the total set of footprints), we have measured that our proposal is able to identify visits and their durations with false and true positives rates between 2 and 9% and over 90%, respectively, at throughputs between 800,000 and 1.4 million DNS packets per second in diverse scenarios, thus proving both its refinement and applicability.

[1]  Micky Lee,et al.  Google ads and the blindspot debate , 2011 .

[2]  Yinghui Yang,et al.  Web user behavioral profiling for user identification , 2010, Decis. Support Syst..

[3]  R. Real,et al.  The Probabilistic Basis of Jaccard's Index of Similarity , 1996 .

[4]  Nick Feamster,et al.  The Effect of DNS on Tor's Anonymity , 2016, NDSS.

[5]  Muttukrishnan Rajarajan,et al.  Survey of approaches and features for the identification of HTTP-based botnet traffic , 2016, J. Netw. Comput. Appl..

[6]  Javier Aracil,et al.  Testing the capacity of off-the-shelf systems to store 10GbE traffic , 2015, IEEE Communications Magazine.

[7]  Christian Rossow,et al.  Going Wild: Large-Scale Classification of Open DNS Resolvers , 2015, Internet Measurement Conference.

[8]  Robert Tappan Morris,et al.  DNS performance and the effectiveness of caching , 2002, TNET.

[9]  Hannes Federrath,et al.  Behavior-based tracking: Exploiting characteristic patterns in DNS traffic , 2013, Comput. Secur..

[10]  Marco Mellia,et al.  DNS to the rescue: discerning content and services in a tangled web , 2012, IMC '12.

[11]  Shigeki Goto,et al.  Statistical estimation of the names of HTTPS servers with domain name graphs , 2016, Comput. Commun..

[12]  Shigeki Goto,et al.  SFMap: Inferring Services over Encrypted Web Flows Using Dynamical Domain Name Graphs , 2015, TMA.

[13]  Markus Jakobsson,et al.  Cache cookies for browser authentication , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[14]  Heejo Lee,et al.  Identifying botnets by capturing group activities in DNS traffic , 2012, Comput. Networks.

[15]  Andrew B. Whinston,et al.  Dynamic profiling of consumers for customized offerings over the Internet: a model and analysis , 2001, Decis. Support Syst..

[16]  Andy Cockburn,et al.  What do web users do? An empirical analysis of web use , 2001, Int. J. Hum. Comput. Stud..

[17]  Jing Tao,et al.  Accurate DNS query characteristics estimation via active probing , 2015, J. Netw. Comput. Appl..

[18]  Keisuke Ishibashi,et al.  Detecting mass-mailing worm infected hosts by mining DNS traffic data , 2005, MineNet '05.

[19]  Mark Allman,et al.  On modern DNS behavior and properties , 2013, CCRV.

[20]  G. Mardente,et al.  Web User-Session Inference by Means of Clustering Techniques , 2009, IEEE/ACM Transactions on Networking.

[21]  Andreas Terzis,et al.  Peeking Through the Cloud: DNS-Based Estimation and Its Applications , 2008, ACNS.

[22]  Javier Aracil,et al.  Multi‐granular, multi‐purpose and multi‐Gb/s monitoring on off‐the‐shelf systems , 2014, Int. J. Netw. Manag..

[23]  Anja Feldmann,et al.  On dominant characteristics of residential broadband internet traffic , 2009, IMC '09.

[24]  Fabian Monrose,et al.  DNS Prefetching and Its Privacy Implications: When Good Things Go Bad , 2010, LEET.

[25]  José Luis García-Dorado,et al.  Characterization of ISP Traffic: Trends, User Habits, and Access Technology Impact , 2012, IEEE Transactions on Network and Service Management.

[26]  Paul Barford,et al.  Context-aware clustering of DNS query traffic , 2008, IMC '08.

[27]  Sergei Vassilvitskii,et al.  Finding the Jaccard median , 2010, SODA '10.