Inferring Popularity of Domain Names with DNS Traffic: Exploiting Cache Timeout Heuristics

Popularity ranking of Internet services is an important metric for network operators, because it enables mid- to-long term planning of their network facilities and root cause analysis for unexpected traffic. The service-oriented traffic monitoring is much helpful to infer the popularity, hence it has been gathering much attention from both researchers and practitioners. Lately, service identification of a given flow has become very difficult due to the rapid growth of CDNs and/or encrypted traffic, while some research works employed preceding DNS traffic as a hint. However, because of its cache mechanism, the DNS message count deviates from the actual number of flows, which can greatly degrade the ranking reliability. We propose a theoretical model for inferring the user's number of accesses per domain name by exploiting the characteristics of the DNS message count. To the best of our knowledge, this paper is the first attempt to formulate the effect of user's stub resolvers; previous studies were focused on analyzing the effect of cache servers. We evaluated the precision of our model with a real dataset of traffic of thousands of users. By analyzing the top-50 domain names by the number of users, we can infer the number of flows within a 24% error rate on average in 42 out of 50 FQDNs.