Supporting Flow-Cardinality Queries with O(1) Time Complexity in High-speed Networks

In high-speed networks, such as Internet backbone, a router may witness millions of IP packet flows passing through concurrently. Maintaining the state of each flow is a fundamental task underlying many network functions, such as load balancing and network anomaly detection. There are two important kinds of per-flow states: per-flow size (e.g., the number of packets received by an arbitrary destination IP) and per-flow cardinality (e.g., the number of distinct source IP addresses that contacted each destination IP). In this paper, we focus on the latter kind of states, and we propose a new problem: online flow-cardinality query, in which we must query any given flow’s cardinality entirely on the data plane with low time complexity. We propose two solutions named On-vHLL and On-vLLC, whose time cost is $\mathcal{O}(1)$ for the query operation. Our query acceleration techniques are three folds. First, we redesign the traditional vHLL and vLLC with new supplementary data structures called incremental update units. When querying a flow’s cardinality, these units can avoid scanning the whole data structure and reduce the time complexity to $\mathcal{O}(1)$. Second, we adopt LogLogCount estimation formula to avoid floating number calculation. Third, we add a fast path implemented by hash table, alongside the relatively slower On-vHLL or On-vLLC sketch. The fast path can absorb the packets belonging to the top-k superspreaders detected in previous time interval. We evaluate our new sketches by experiments based on CAIDA traffic traces. The results show that our sketches need less than 5 memory accesses per arrival packet. The time cost of our query operation decreases by hundreds of times, and the accuracy of flow cardinality estimation degrades quite modestly by only 20%, as compared with the counterpart vHLL.