Sampling for Passive Internet Measurement: A Review

Sampling has become an integral part of passive network measurement. This role is driven by the need to control the consumption of resources in the measurement infrastructure under increasing traffic rates and the demand for detailed measurements from applications and service providers. Classical sampling methods play an important role in the current practice of Internet measurement. The aims of this review are (i) to explain the classical sampling methodology in the context of the Internet to readers who are not necessarily acquainted with either, (ii) to give an account of newer applications and sampling methods for passive measurement and (iii) to identify emerging areas that are ripe for the application of statistical expertise.

[1]  C. Estan,et al.  Bitmap Algorithms for Counting Active Flows on High-Speed Links , 2006, IEEE/ACM Transactions on Networking.

[2]  Art B. Owen,et al.  Data Squashing by Empirical Likelihood , 2004, Data Mining and Knowledge Discovery.

[3]  Don Towsley,et al.  The use of end-to-end multicast measurements for characterizing internal network behavior , 2000, IEEE Commun. Mag..

[4]  Jeffrey D. Case,et al.  Simple Network Management Protocol (SNMP) , 1990, RFC.

[5]  Robert Nowak,et al.  Internet tomography , 2002, IEEE Signal Process. Mag..

[6]  Matthias Grossglauser,et al.  Trajectory sampling for direct traffic observation , 2000, SIGCOMM 2000.

[7]  Stefano Giordano,et al.  Traffic Sampling Methods for End-to-End QoS Evaluation in Large Heterogeneous Networks , 1998, Comput. Networks.

[8]  Graham Cormode,et al.  What's new: finding significant differences in network data streams , 2004, IEEE/ACM Transactions on Networking.

[9]  Anja Feldmann,et al.  Efficient policies for carrying Web traffic over flow-switched networks , 1998, TNET.

[10]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.1 , 1997, RFC.

[11]  Alex C. Snoeren,et al.  Hash-based IP traceback , 2001, SIGCOMM '01.

[12]  George C. Polyzos,et al.  A Parameterizable Methodology for Internet Traffic Flow Profiling , 1995, IEEE J. Sel. Areas Commun..

[13]  Anja Feldmann,et al.  Measurement and analysis of IP network usage and behavior , 2000, IEEE Commun. Mag..

[14]  Theodore Johnson,et al.  Gigascope: high performance network monitoring with an SQL interface , 2002, SIGMOD '02.

[15]  Jerry R. Hobbs,et al.  An algebraic approach to IP traceback , 2002, TSEC.

[16]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[17]  Steven Waldbusser Remote Network Monitoring Management Information Base , 1991, RFC.

[18]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.0 , 1996, RFC.

[19]  Kihong Park,et al.  The Internet as a Large-Scale Complex System , 2005, Santa Fe Institute Studies in the Sciences of Complexity.

[20]  F. Y. Edgeworth,et al.  The theory of statistics , 1996 .

[21]  Rajeev Motwani,et al.  On random sampling over joins , 1999, SIGMOD '99.

[22]  kc claffy,et al.  Application of sampling methodologies to network traffic characterization , 1993, SIGCOMM 1993.

[23]  Hans-Werner Braun,et al.  Storage and bandwidth requirements for passive Internet header traces , 2001 .

[24]  Tim Berners-Lee,et al.  Hypertext transfer protocol--http/i , 1993 .

[25]  Zhi-Li Zhang,et al.  Adaptive random sampling for load change detection , 2002, SIGMETRICS '02.

[26]  Jon Postel,et al.  Internet Protocol , 1981, RFC.

[27]  Vern Paxson,et al.  Framework for IP Performance Metrics , 1998, RFC.

[28]  Kimberly C. Claffy,et al.  OC3MON: Flexible, Affordable, High Performance Staistics Collection , 1996, LISA.

[29]  Peter Phaal,et al.  InMon Corporation's sFlow: A Method for Monitoring Traffic in Switched and Routed Networks , 2001, RFC.

[30]  George Varghese,et al.  Bitmap algorithms for counting active flows on high speed links , 2003, IMC '03.

[31]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[32]  Anna C. Gilbert,et al.  QuickSAND: Quick Summary and Analysis of Network Data , 2001 .

[33]  Nicolas Hohn,et al.  Inverting sampled traffic , 2003, IEEE/ACM Transactions on Networking.

[34]  Murali S. Kodialam,et al.  Runs based traffic estimator (RATE): a simple, memory efficient scheme for per-flow rate estimation , 2004, IEEE INFOCOM 2004.

[35]  Walter Willinger,et al.  On the Self-Similar Nature of Ethernet Traffic ( extended version ) , 1995 .

[36]  Peter J. Haas,et al.  The New Jersey Data Reduction Report , 1997 .

[37]  Carsten Lund,et al.  Predicting resource usage and estimation accuracy in an IP flow measurement collection infrastructure , 2003, IMC '03.

[38]  P. Flajolet,et al.  Loglog counting of large cardinalities , 2003 .

[39]  A. Winsor Sampling techniques. , 2000, Nursing times.

[40]  Yin Zhang,et al.  On the constancy of internet path properties , 2001, IMW '01.

[41]  Jon Postel,et al.  User Datagram Protocol , 1980, RFC.

[42]  Kenneth J. Christensen,et al.  Adaptive sampling methods to determine network traffic statistics including the Hurst parameter , 1998, Proceedings 23rd Annual Conference on Local Computer Networks. LCN'98 (Cat. No.98TB100260).

[43]  Theodore Johnson,et al.  Squashing flat files flatter , 1999, KDD '99.

[44]  Timothy Roscoe,et al.  Metadata Management of Terabyte Datasets from an IP Backbone Network: Experience , 2001, SIGMOD 2001.

[45]  Carsten Lund,et al.  Charging from sampled network usage , 2001, IMW '01.

[46]  Anja Feldmann,et al.  Deriving traffic demands for operational IP networks: methodology and experience , 2000, SIGCOMM.

[47]  Alan D. George,et al.  Adaptive Sampling for Network Management , 2001, Journal of Network and Systems Management.

[48]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2005, TNET.

[49]  Ronald W. Wolff,et al.  Poisson Arrivals See Time Averages , 1982, Oper. Res..

[50]  Lillian N. Cassel,et al.  Management of sampled real-time network measurements , 1989, [1989] Proceedings. 14th Conference on Local Computer Networks.

[51]  Tanja Zseby,et al.  Deployment of Sampling Methods for SLA Validation with Non-Intrusive Measurements , 2002 .

[52]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[53]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[54]  Carsten Lund,et al.  Properties and prediction of flow statistics from sampled packet streams , 2002, IMW '02.

[55]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[56]  Philippe Flajolet,et al.  Adaptive Sampling , 1997 .

[57]  Vern Paxson,et al.  End-to-end routing behavior in the Internet , 1996, TNET.

[58]  Dawn Xiaodong Song,et al.  Advanced and authenticated marking schemes for IP traceback , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[59]  V. Paxson End-to-end routing behavior in the internet , 2006, CCRV.

[60]  Anja Feldmann,et al.  NetScope: traffic engineering for IP networks , 2000, IEEE Netw..

[61]  Carsten Lund,et al.  Learn more, sample less: control of volume and variance in network measurement , 2005, IEEE Transactions on Information Theory.

[62]  Y. Vardi,et al.  Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data , 1996 .

[63]  Anna R. Karlin,et al.  Practical network support for IP traceback , 2000, SIGCOMM.

[64]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[65]  Murali S. Kodialam,et al.  Detecting network intrusions via sampling: a game theoretic approach , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[66]  Carsten Lund,et al.  Flow sampling under hard resource constraints , 2004, SIGMETRICS '04/Performance '04.

[67]  Grenville Armitage MPLS: the magic behind the myths [multiprotocol label switching] , 2000 .

[68]  Jeffrey D. Case,et al.  Simple Network Management Protocol (SNMP) , 1989, RFC.

[69]  George C. Polyzos,et al.  Application of sampling methodologies to network traffic characterization , 1993, SIGCOMM '93.

[70]  M. E. Johnson,et al.  Estimating model discrepancy , 1990 .

[71]  Anja Feldmann,et al.  Performance of Web proxy caching in heterogeneous bandwidth environments , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[72]  Nick G. Duffield,et al.  Trajectory engine: a backend for trajectory sampling , 2002, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).

[73]  Pierre L'Ecuyer,et al.  Efficient and portable combined random number generators , 1988, CACM.

[74]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[75]  Vern Paxson,et al.  Empirically derived analytic models of wide-area TCP connections , 1994, TNET.

[76]  A. Kumar,et al.  Space-code bloom filter for efficient per-flow traffic measurement , 2004, IEEE INFOCOM 2004.

[77]  Anja Feldmann,et al.  Deriving traffic demands for operational IP networks: methodology and experience , 2001, TNET.

[78]  Philippe Flajolet,et al.  Loglog Counting of Large Cardinalities (Extended Abstract) , 2003, ESA.

[79]  William DuMouchel,et al.  Data squashing: constructing summary data sets , 2002 .

[80]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[81]  Jim Alves-Foss,et al.  NATE: Network Analysis of Anomalous Traffic Events, a low-cost approach , 2001, NSPW '01.

[82]  Yossi Matias,et al.  DIMACS Series in Discrete Mathematicsand Theoretical Computer Science Synopsis Data Structures for Massive Data , 2007 .

[83]  Peter Phaal,et al.  Traffic estimation for the largest sou r ces on a n etwork , u sin g pa ck et sampling with limited storage , 2000 .

[84]  Stefan Savage,et al.  The Spread of the Sapphire/Slammer Worm , 2003 .

[85]  Joyce K. Reynolds Assigned Numbers: RFC 1700 is Replaced by an On-line Database , 2002, RFC.

[86]  Erik D. Demaine,et al.  Identifying frequent items in sliding windows over on-line packet streams , 2003, IMC '03.

[87]  Christian Posse,et al.  Likelihood-Based Data Squashing: A Modeling Approach to Instance Construction , 2002, Data Mining and Knowledge Discovery.

[88]  Sonia Panchen,et al.  Traffic Monitoring with Packet-Based Sampling for Defense against Security Threats , 2002 .

[89]  Grenville Armitage,et al.  MPLS: the magic behind the myths , 2000 .