Transparent identification of network flow and its security applications

Many people believe that flow transformations, such as encryption, network address translation (NAT), timing perturbation, traffic padding (or adding cover traffic, bogus packets), packet dropping, flow mixing, flow splitting and merging, etc., would make one network flow virtually indistinguishable from another. This belief is evidenced by existing anonymizing communication systems and the practice an attacker uses to hide its source and identity. This dissertation challenges this common belief by investigating network flow identification and its security applications. Specifically, an active timing-based, transparent network flow identification technique; an interval centroid-based watermarking scheme, is proposed. The technique is able to make any sufficiently long flow uniquely identifiable even if (1) it is mixed or merged with a number of other flows, (2) it is split into a number of subflows, and (3) there is a substantial portion of packets dropped. To the best of our knowledge, the technique is the only one in the literature that has such capabilities in flow identifications. This dissertation explores two security applications of flow identification techniques. One is to track anonymous peer-to-peer Voice over IP (VoIP) calls on the Internet, and the other is to provide authentication services to streaming data. This dissertation demonstrates that tracking anonymous peer-to-peer VoIP calls on the Internet is feasible and providing authentication to streaming data without communication overhead is practical. The study of this dissertation shows that traditional flow transformations; such as encryption, NAT, timing perturbation, traffic padding, packet dropping, and flow mixing/splitting/merging, do not necessarily provide the level of anonymity people have expected or believed in. The result of this study also demonstrates that achieving the unlinkability of sender and receiver in low-latency communication systems is much harder than we have realized, and current flow transformation-based low-latency anonymous communication systems need to be revisited. This dissertation also provides a design and implementation of a real-time high-precision watermarking engine. It is the first mechanism that enables one to delay any specified packet of any specified packet flow for any specified duration with precision of 100 microsecond. It has guaranteed precision regardless of the current CPU work load. This implementation design not only guarantees the practicality of the proposed network flow identification technique; but also enriches the highly needed network simulation toolkits repertory.