You Are How You Query: Deriving Behavioral Fingerprints from DNS Traffic

As the Domain Name System (DNS) plays an indispensable role in a large number of network applications including those used for malicious purposes, collecting and sharing DNS traffic from real networks are highly desired for a variety of purposes such as measurements and system evaluation. However, information leakage through the collected network traffic raises significant privacy concerns and DNS traffic is not an exception. In this paper, we study a new privacy risk introduced by passively collected DNS traffic. We intend to derive behavioral fingerprints from DNS traces, where each behavioral fingerprint targets at uniquely identifying its corresponding user and being immune to the change of time. We have proposed a set of new patterns, which collectively form behavioral fingerprints by characterizing a user’s DNS activities through three different perspectives including the domain name, the inter-domain relationship, and domains’ temporal behavior. We have also built a distributed system, namely DNSMiner, to automatically derive DNS-based behavioral fingerprints from a massive amount of DNS traces. We have performed extensive evaluation based on a large volume of DNS queries collected from a large campus network across two weeks. The evaluation results have demonstrated that a significant percentage of network users with persistent DNS activities are likely to have DNS behavioral fingerprints.

[1]  Akira Yamada,et al.  Passive OS Fingerprinting by DNS Traffic Analysis , 2013, 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA).

[2]  Leyla Bilge,et al.  EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis , 2011, NDSS.

[3]  Robert Tappan Morris,et al.  DNS performance and the effectiveness of caching , 2001, IMW '01.

[4]  Leyla Bilge,et al.  Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains , 2014, TSEC.

[5]  Rui Wang,et al.  Side-Channel Leaks in Web Applications: A Reality Today, a Challenge Tomorrow , 2010, 2010 IEEE Symposium on Security and Privacy.

[6]  Wenke Lee,et al.  Modeling Botnet Propagation Using Time Zones , 2006, NDSS.

[7]  Christopher Krügel,et al.  A Practical Attack to De-anonymize Social Network Users , 2010, 2010 IEEE Symposium on Security and Privacy.

[8]  Wenke Lee,et al.  Detecting Malware Domains at the Upper DNS Hierarchy , 2011, USENIX Security Symposium.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  George Pallis,et al.  Content Delivery Networks: Status and Trends , 2003, IEEE Internet Comput..

[11]  Charles V. Wright,et al.  Spot Me if You Can: Uncovering Spoken Phrases in Encrypted VoIP Conversations , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[12]  Fan Zhang,et al.  Inferring users' online activities through traffic analysis , 2011, WiSec '11.

[13]  Anees Shaikh,et al.  On the effectiveness of DNS-based server selection , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[14]  Felix C. Freiling,et al.  Measuring and Detecting Fast-Flux Service Networks , 2008, NDSS.

[15]  Srinivasan Seshan,et al.  802.11 user fingerprinting , 2007, MobiCom '07.

[16]  Hannes Federrath,et al.  Behavior-based tracking: Exploiting characteristic patterns in DNS traffic , 2013, Comput. Secur..

[17]  Charles V. Wright,et al.  Language Identification of Encrypted VoIP Traffic: Alejandra y Roberto or Alice and Bob? , 2007, USENIX Security Symposium.

[18]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[19]  Brian Neil Levine,et al.  Inferring the source of encrypted HTTP connections , 2006, CCS '06.

[20]  Lusheng Ji,et al.  Characterizing and modeling internet traffic dynamics of cellular devices , 2011, SIGMETRICS '11.

[21]  Angelos D. Keromytis,et al.  Taming the Devil: Techniques for Evaluating Anonymized Network Data , 2008, NDSS.

[22]  Fabian Monrose,et al.  DNS Prefetching and Its Privacy Implications: When Good Things Go Bad , 2010, LEET.

[23]  Lili Qiu,et al.  Statistical identification of encrypted Web browsing traffic , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[24]  Vern Paxson,et al.  Practical Comprehensive Bounds on Surreptitious Communication over DNS , 2013, USENIX Security Symposium.

[25]  Roberto Perdisci,et al.  From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware , 2012, USENIX Security Symposium.