Synopsis based load shedding in XML streams

Stream systems are susceptible to variations in data arrival rate. At times, data arrival rate may spike up to cause unacceptable output latencies and unpredictable system behavior. Recently, load shedding systems have been proposed to deal with this situation. But almost all these systems are for relational data streams and, to the best of our knowledge, none has been proposed for XML data streams so far except [15]. Dropping data randomly may have been an effective method for load shedding in the relational context, due to the uniformity of relational data. But in the XML context, the same method will lead to much invasive negative effect on processing of XML queries due to the recursive and nested structure of XML data. We propose a load shedding framework for XML data streams. We explore the effectiveness of various load shedding techniques based on a general load shedding strategy that takes into account QoS parameters and relative accuracy of the query results. We implement various load shedding strategies and present their result.

[1]  Philip S. Yu,et al.  A Load Shedding Framework and Optimizations for M-way Windowed Stream Joins , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  Neoklis Polyzotis,et al.  Approximate XML query answers , 2004, SIGMOD '04.

[3]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[4]  Ranjan K. Dash,et al.  A Fully Pipelined XQuery Processor , 2006, XIME-P.

[5]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[6]  Jeffrey Scott Vitter,et al.  CXHist : An On-line Classification-Based Histogram for XML String Selectivity Estimation , 2005, VLDB.

[7]  Jeffrey Scott Vitter,et al.  XPathLearner: An On-line Self-Tuning Markov Histogram for XML Path Selectivity Estimation , 2002, VLDB.

[8]  Jeffrey F. Naughton,et al.  Estimating the Selectivity of XML Path Expressions for Internet Scale Applications , 2001, VLDB.

[9]  Hongjun Lu,et al.  Containment join size estimation: models and methods , 2003, SIGMOD '03.

[10]  Song Liu,et al.  Load shedding in stream databases: a control-based approach , 2006, VLDB.

[11]  Neoklis Polyzotis,et al.  Structure and Value Synopses for XML Data Graphs , 2002, VLDB.

[12]  Alfredo Cuzzocrea,et al.  Synopsis Data Structures for XML Databases: Models, Issues, and Research Perspectives , 2007 .

[13]  Neoklis Polyzotis,et al.  Selectivity estimation for XML twigs , 2004, Proceedings. 20th International Conference on Data Engineering.

[14]  Juliana Freire,et al.  StatiX: making XML count , 2002, SIGMOD '02.

[15]  Jignesh M. Patel,et al.  Estimating Answer Sizes for XML Queries , 2002, EDBT.

[16]  Hongjun Lu,et al.  Bloom Histogram: Path Selectivity Estimation for XML Data with Updates , 2004, VLDB.

[17]  Elke A. Rundensteiner,et al.  Utility-driven load shedding for xml stream processing , 2008, WWW.