Customizable Parallel Execution of Scientific Stream Queries

Scientific applications require processing high-volume on-line streams of numerical data from instruments and simulations. We present an extensible stream database system that allows scalable and flexible continuous queries on such streams. Application dependent streams and query functions are defined through an object-relational model. Distributed execution plans for continuous queries are described as high-level data flow distribution templates. Using a generic template we define two partitioning strategies for scalable parallel execution of expensive stream queries: window split and window distribute. Window split provides operators for parallel execution of query functions by reducing the size of stream data units using application dependent functions as parameters. By contrast, window distribute provides operators for customized distribution of entire data units without reducing their size. We evaluate these strategies for a typical high volume scientific stream application and show that window split is favorable when expensive queries are executed on limited resources, while window distribution is better otherwise.

[1]  Goetz Graefe,et al.  Volcano - An Extensible and Parallel Query Evaluation System , 1994, IEEE Trans. Knowl. Data Eng..

[2]  Bengt Carlsson,et al.  Cost-efficient operation of a denitrifying activated sludge process. , 2007, Water research.

[3]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[4]  Goetz Graefe,et al.  Algebraic Optimization of Computations over Scientific Databases , 1993, IEEE Data Eng. Bull..

[5]  Arne Andersson,et al.  A flexible model for tree-structured multi-commodity markets , 2007, Electron. Commer. Res..

[6]  Claes Olsson,et al.  Disturbance Observer-Based Automotive Engine Vibration Isolation Dealing With Non-linear Dynamics and Transient Excitation , 2005 .

[7]  Eric A. Brewer,et al.  Highly available, fault-tolerant, parallel dataflows , 2004, SIGMOD '04.

[8]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[9]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[10]  Patrick Valduriez,et al.  Principles of distributed database systems (2nd ed.) , 1999 .

[11]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[12]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[13]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[14]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[15]  Claes Olsson,et al.  Structure Flexibility Impacts on Robust Active Vibration Isolation Using Mixed Sensitivity Optimisation , 2007 .

[16]  Richard R. Muntz,et al.  Parallelizing user-defined functions in distributed object-relational DBMS , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[17]  David J. DeWitt,et al.  Tuple Routing Strategies for Distributed Eddies , 2003, VLDB.

[18]  Andrew Heybey,et al.  Tribeca: A System for Managing Large Databases of Network Traffic , 1998, USENIX Annual Technical Conference.

[19]  James G. Anderson Evaluation and Reflections on the Design of the WeAidU system , 2005 .

[20]  Henrik Brandén,et al.  Preconditioners Based on Fundamental Solutions , 2005 .

[21]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.