A practical evaluation of load shedding in data stream management systems for network monitoring

In network monitoring, an important issue is the number of tuples the data stream management system (DSMS) can handle for different network loads. In order to gracefully handle overload situations, some DSMSs are equipped with a tuple dropping functionality, also known as load shedding. These DSMSs register and relate the number of received and dropped tuples, i.e., the relative throughput, and perform different kinds of calculations on them. Over the past few years, several solutions and methods have been suggested to efficiently perform load shedding. The simplestapproach is to keep a count of all the dropped tuples, and to report this to the end user. In our experiments, we study two DSMSs, i.e., TelegraphCQ with support for load shedding, and STREAM without this support. We use three particular network monitoring tasks to evaluate the two DSMS with respect to their ability of load shedding and performance. We demonstrate that it is impor- tant to investigate the correctness of load shedding by showing that the reported number of dropped tuples is not always correct.

[1]  Jennifer Widom,et al.  A Data Stream Management System for Network Traffic Management , 2001 .

[2]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[3]  Frederick Reiss,et al.  TelegraphCQ: An Architectural Status Report , 2003, IEEE Data Eng. Bull..

[4]  Frederick Reiss,et al.  Declarative Network Monitoring with an Underprovisioned Query Processor , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[5]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[6]  Alfons Kemper,et al.  Bulletin of the Ieee Computer Society Technical Committee on Data Engineering , 1999 .

[7]  Anja Feldmann,et al.  A non-instrusive, wavelet-based approach to detecting network performance problems , 2001, IMW '01.

[8]  Jon Postel,et al.  Transmission Control Protocol , 1981, RFC.

[9]  Guillaume Urvoy-Keller,et al.  Using Data Stream Management Systems for Traffic Analysis - A Case Study , 2004, PAM.

[10]  Carsten Lund,et al.  Learn more, sample less: control of volume and variance in network measurement , 2005, IEEE Transactions on Information Theory.

[11]  Konstantina Papagiannaki,et al.  Flow classification by histograms: or how to go on safari in the internet , 2004, SIGMETRICS '04/Performance '04.

[12]  Nick Duffield,et al.  Sampling for Passive Internet Measurement: A Review , 2004 .

[13]  Joseph M. Hellerstein,et al.  Using state modules for adaptive query processing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[14]  Barbara A. Denny,et al.  Traffic generator software release notes , 2002 .

[15]  David C. Plummer,et al.  Ethernet Address Resolution Protocol: Or Converting Network Protocol Addresses to 48.bit Ethernet Address for Transmission on Ethernet Hardware , 1982, RFC.

[16]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[17]  Frederick Reiss,et al.  Data Triage: an adaptive architecture for load shedding in TelegraphCQ , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[19]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[20]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[21]  Divesh Srivastava,et al.  Streams, Security and Scalability , 2005, DBSec.

[22]  Theodore Johnson,et al.  Gigascope: high performance network monitoring with an SQL interface , 2002, SIGMOD '02.

[23]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.