Characterizing large DNS traces using graphs

The increasing deployment of overlay networks that rely on DNS tricks has led to added interest in examining DNS traffic. In this paper we report on a characterization of DNS traffic gathered over a period of several weeks at Internet Gateway Routers (IGRs) in the AT&T Common Backbone. The characterization is carried out using several novel techniques to identify clients, local DNS servers, and authoritative DNS servers. Our techniques include passive and active measurements, graph-based analysis, examination of outliers, and explicit checks against data obtained from several external sources. Our contribution is the reduction of a very large data set (over 1 terabyte of raw data) into a significantly smaller representation that is ideally suited for answering protocol-specific semantic queries quickly. After categorizing the addresses, we use the network aware clustering technique to group local DNS servers. By juxtaposing the DNS server clusters with clusters formed by Web clients obtained from a large portal Web site, we determine the distribution of identified DNS servers in busy clusters. A variety of applications are examined, ranging from identifying suspected zombies to helping Content Distribution Networks in mapping location of DNS servers.