Fast monitoring of traffic subpopulations

Network accounting, forensics, security, and performance monitoring applications often need to examine detailed traces from subsets of flows ("subpopulations"), where the application desires flexibility in specifying the subpopulation (e.g., to detect a portscan, the application must observe many packets between a source and a destination with one packet to each port). However, the dynamism and volume of network traffic on many high-speed links necessitates traffic sampling, which adversely affects subpopulation monitoring: because many subpopulations of interest to operators are low-volume flows, conventional sampling schemes (e.g., uniform random sampling) miss much of the subpopulation's traffic. Today's routers and network devices provide scant support for monitoring specific traffic subpopulations. This paper presents the design, implementation, and evaluation of FlexSample, a traffic monitoring engine that dynamically extracts traffic from subpopulations that operators define using conditions on packet header fields. FlexSample uses a fast, flexible counter array to provide rough estimates of packets' membership in respective subpopulations. Based on these coarse estimates, FlexSample then makes per-packet sampling decisions to sample proportionately from each subpopulation (as specified by a network operator), subject to an overall sampling constraint. We apply FlexSample to extract subpopulations such as port scans and traffic to high-degree nodes and find that it is able to capture significantly more packets from these subpopulations than conventional approaches.

[1]  Nicolas Hohn,et al.  Inverting sampled traffic , 2003, IEEE/ACM Transactions on Networking.

[2]  George Varghese,et al.  Network Algorithmics-An Interdisciplinary Approach to Designing Fast Networked Devices , 2004 .

[3]  Carsten Lund,et al.  An information-theoretic approach to traffic matrix estimation , 2003, SIGCOMM '03.

[4]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[5]  Carsten Lund,et al.  Charging from sampled network usage , 2001, IMW '01.

[6]  Hui Zang,et al.  Is sampled data sufficient for anomaly detection? , 2006, IMC '06.

[7]  Baek-Young Choi,et al.  On the Accuracy and Overhead of Cisco Sampled NetFlow , 2005 .

[8]  Albert G. Greenberg,et al.  A Framework for Packet Selection and Reporting , 2009, RFC.

[9]  Donald F. Towsley,et al.  Inferring link loss using striped unicast probes , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[10]  Martin May,et al.  Impact of packet sampling on anomaly detection metrics , 2006, IMC '06.

[11]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[12]  Andrea Montanari,et al.  Counter braids: a novel counter architecture for per-flow measurement , 2008, SIGMETRICS '08.

[13]  J. M. Pullen,et al.  Countering denial-of-service attacks using congestion triggered packet sampling and filtering , 2001, Proceedings Tenth International Conference on Computer Communications and Networks (Cat. No.01EX495).

[14]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[15]  George C. Polyzos,et al.  Application of sampling methodologies to network traffic characterization , 1993, SIGCOMM '93.

[16]  George Varghese,et al.  Building a better NetFlow , 2004, SIGCOMM.

[17]  Abhishek Kumar,et al.  Sketch Guided Sampling - Using On-Line Estimates of Flow Size for Adaptive Data Collection , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[18]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM.

[19]  Abhishek Kumar,et al.  A data streaming algorithm for estimating subpopulation flow size distribution , 2005, SIGMETRICS '05.

[20]  Ramesh Karri,et al.  Divide-and-concatenate: an architecture-level optimization technique for universal hash functions , 2005, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[21]  Nick G. Duffield,et al.  Sampling and Filtering Techniques for IP Packet Selection , 2009, RFC.

[22]  Carsten Lund,et al.  Predicting resource usage and estimation accuracy in an IP flow measurement collection infrastructure , 2003, IMC '03.

[23]  Ramesh Karri,et al.  Divide-and-concatenate: an architecture level optimization technique for universal hash functions , 2004, Proceedings. 41st Design Automation Conference, 2004..

[24]  Craig Partridge,et al.  Hardware support for a hash-based IP traceback , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[25]  Balachander Krishnamurthy,et al.  A generic language for application-specific flow sampling , 2008, CCRV.

[26]  Paul Barford,et al.  A signal analysis of network traffic anomalies , 2002, IMW '02.

[27]  Chadi Barakat,et al.  Reformulating the Monitor Placement Problem: Optimal Network-Wide Sampling , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[28]  Abhishek Kumar,et al.  Data streaming algorithms for efficient and accurate estimation of flow size distribution , 2004, SIGMETRICS '04/Performance '04.

[29]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2005, TNET.

[30]  Anja Feldmann,et al.  Enriching network security analysis with time travel , 2008, SIGCOMM '08.

[31]  Anja Feldmann,et al.  Deriving traffic demands for operational IP networks: methodology and experience , 2000, SIGCOMM.

[32]  Walter Willinger,et al.  cSamp: A System for Network-Wide Flow Monitoring , 2008, NSDI.

[33]  Ramana Rao Kompella,et al.  The power of slicing in internet flow measurement , 2005, IMC '05.

[34]  Zhi-Li Zhang,et al.  Profiling internet backbone traffic: behavior models and applications , 2005, SIGCOMM '05.

[35]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[36]  Carsten Lund,et al.  Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications , 2004, IMC '04.