Preferential Resource Allocation in Stream Processing Systems

Overloaded data stream management systems (DSMS) cannot process all tuples within their response time. For some DSMS it is crucial to allocate the precious resources to process the most significant tuples. Prior work has applied shedding and spilling to permanently drop or temporarily place to disk insignificant tuples. However neither approach considers that tuple significance can be multi-tiered nor that significance determination can be costly. These approaches consider all tuples not dropped to be equally significant. Unlike these prior works, we take a fresh stance by pulling the most significant tuples forward throughout the query pipeline. Proactive Promotion (PP), a new DSMS methodology for preferential CPU resource allocation, selectively pulls the most significant tuples ahead of less significant tuples. Our optimizer produces an optimal PP plan that minimizes the processing latency of tuples in the most significant tiers in this multi-tiered precedence scheme by strategically placing significance determination operators throughout the query pipeline at compile-time and by agilely activating them at run-time. Our results substantiate that PP lowers the latency and increases the throughput for significant results when compared to the state-of-the-art shedding and traditional DSMS approaches (between 2 and 18 fold for a rich diversity of datasets) with negligible overhead.

[1]  Luping Ding,et al.  CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity , 2004, VLDB.

[2]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[3]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[4]  Elke A. Rundensteiner,et al.  Sequence Pattern Query Processing over Out-of-Order Event Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[5]  Michael Stonebraker,et al.  The Aurora and Medusa Projects , 2003, IEEE Data Eng. Bull..

[6]  Tho Le-Ngoc,et al.  Priority queuing of long-range dependent traffic , 2003, GLOBECOM '03. IEEE Global Telecommunications Conference (IEEE Cat. No.03CH37489).

[7]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[8]  Djoerd Hiemstra,et al.  Query Performance Prediction: Evaluation Contrasted with Effectiveness , 2010, ECIR.

[9]  Paul N. Bennett,et al.  Estimating query performance using class predictions , 2009, SIGIR.

[10]  Brian K. Payne,et al.  Understanding the Experience of House Arrest with Electronic Monitoring: An Analysis of Quantitative and Qualitative Data , 2000 .

[11]  Sang Hyuk Son,et al.  RTSTREAM: real-time query processing for data streams , 2006, Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'06).

[12]  Jennifer Widom,et al.  Adaptive ordering of pipelined stream filters , 2004, SIGMOD '04.

[13]  Stanley B. Zdonik,et al.  Changing the rules: transformations for rule-based optimizers , 1998, SIGMOD '98.

[14]  Abhinandan Das,et al.  Semantic approximation of data stream joins , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Elisa Bertino,et al.  Self-tuning query mesh for adaptive multi-route query processing , 2009, EDBT '09.

[16]  Heiko Schuldt,et al.  Sensor Data Stream Processing in Health Monitoring , 2003, Mobilität und Informationssysteme.

[17]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[18]  Walid G. Aref,et al.  Scheduling for shared window joins over data streams , 2003, VLDB.

[19]  Lei Liu,et al.  MobiMine: monitoring the stock market from a PDA , 2002, SKDD.

[20]  Sang Hyuk Son,et al.  Prediction-Based QoS Management for Real-Time Data Streams , 2006, 2006 27th IEEE International Real-Time Systems Symposium (RTSS'06).

[21]  David Maier,et al.  No pane, no gain: efficient evaluation of sliding-window aggregates over data streams , 2005, SGMD.

[22]  Frederick Reiss,et al.  Design Considerations for High Fan-In Systems: The HiFi Approach , 2005, CIDR.

[23]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[24]  Sivaramakrishnan Narayanan,et al.  Dynamic prioritization of database queries , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[25]  David Maier,et al.  Exploiting Punctuation Semantics in Continuous Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[26]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[27]  Jan Chomicki,et al.  Semantic optimization techniques for preference queries , 2005, Inf. Syst..

[28]  Richard Granger,et al.  Beyond Incremental Processing: Tracking Concept Drift , 1986, AAAI.

[29]  Nesime Tatbul OoS-Driven Load Shedding on Data Streams , 2002, EDBT PhD Workshop.

[30]  Michael Stonebraker,et al.  Predicate migration: optimizing queries with expensive predicates , 1992, SIGMOD Conference.

[31]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[32]  Elke A. Rundensteiner,et al.  Run-time operator state spilling for memory intensive long-running queries , 2006, SIGMOD Conference.

[33]  Jeffrey F. Naughton,et al.  Approximating StreamingWindow Joins Under CPU Limitations , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[34]  Elke A. Rundensteiner,et al.  The Proactive Promotion Engine , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[35]  Elke A. Rundensteiner,et al.  ClusterSheddy : Load Shedding Using Moving Clusters over Spatio-temporal Data Streams , 2007, DASFAA.

[36]  Chung-Chih Lin,et al.  Wireless Health Care Service System for Elderly With Dementia , 2006, IEEE Transactions on Information Technology in Biomedicine.

[37]  L. Lipsitz,et al.  In Situ Monitoring of Health in Older Adults: Technologies and Issues , 2010, Journal of the American Geriatrics Society.

[38]  Joseph M. Hellerstein,et al.  Partial results for online query processing , 2002, SIGMOD '02.

[39]  Elke A. Rundensteiner,et al.  Achieving high output quality under limited resources through structure-based spilling in XML streams , 2010, Proc. VLDB Endow..

[40]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, Distributed and Parallel Databases.

[41]  Hans-Jürgen Appelrath,et al.  A physical operator algebra for prioritized elements in data streams , 2010, Computer Science - Research and Development.