dQCOB: managing large data flows using dynamic embedded queries

The dQUOB system satisfies client need for specific information from high-volume data streams. The data streams we speak of are the flow of data existing during large-scale visualizations, video streaming to large numbers of distributed users, and high volume business transactions. We introduce the notion of conceptualizing a data stream as a set of relational database tables so that a scientist can request information with an SQL-like query. Transformation or computation that often needs to be performed on the data en-route can be conceptualized as computation performed on consecutive views of the data, with computation associated with each view. The dQUOB system moves the query code into the data stream as a quoblet; as compiled code. The relational database data model has the significant advantage of presenting opportunities for efficient reoptimizations of queries and sets of queries. Using examples from global atmospheric modeling, we illustrate the usefulness of the dQUOB system. We carry the examples through the experiments to establish the viability of the approach for high performance computing with a baseline benchmark. We define a cost-metric of end-to-end latency that can be used to determine realistic cases where optimization should be applied. Finally, we show that end-to-end latency can be controlled through a probability assigned to a query that a query will evaluate to true.

[1]  Karsten Schwan,et al.  Event services for high performance computing , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[2]  Karsten Schwan,et al.  Realizing distributed computational laboratories , 1999 .

[3]  Henri Casanova,et al.  NetSovle: A Network Server for Solving Computational Science Problems , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[4]  Karsten Schwan,et al.  Run-time detection in parallel and distributed systems: application to safety-critical systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[5]  M.,et al.  An Overview of the Pablo Performance Analysis , 1992 .

[6]  Karsten Schwan,et al.  A parallel spectral model for atmospheric transport processes , 1996 .

[7]  Jason Maassen,et al.  Parallel Computing on Wide-Area Clusters: the Albatross Project, , 1999 .

[8]  Calton Pu,et al.  Differential evaluation of continual queries , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[9]  B. Miller,et al.  The Paradyn Parallel Performance Measurement Tools , 1995 .

[10]  Karsten Schwan,et al.  Software approach to hazard detection using on-line analysis of safety constraints , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.

[11]  John K. Ousterhout,et al.  Tcl and the Tk Toolkit , 1994 .

[12]  Greg Eisenhauer,et al.  Fast heterogeneous binary data interchange , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[13]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[14]  Clement T. Yu,et al.  Priniples of Database Query Processing for Advanced Applications , 1997 .

[15]  David J. DeWitt,et al.  Equi-depth multidimensional histograms , 1988, SIGMOD '88.

[16]  Karsten Schwan,et al.  Falcon: On-line monitoring for steering parallel programs , 1998, Concurr. Pract. Exp..

[17]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[18]  Gregor von Laszewski,et al.  Distance Visualization: Data Exploration on the Grid , 1999, Computer.

[19]  Michael J. Magee,et al.  Reducing data distribution bottlenecks by employing data visualization filters , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[20]  Jeffrey S. Vetter,et al.  Autopilot: adaptive control of distributed applications , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[21]  Jan Chomicki,et al.  Efficient checking of temporal integrity constraints using bounded history encoding , 1995, TODS.

[22]  C.R. Johnson,et al.  SCIRun: A Scientific Programming Environment for Computational Steering , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[23]  Karsten Schwan,et al.  ACDS: Adapting computational data streams for high performance , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[24]  John A. Reed,et al.  Development of an intelligent monitoring and control system for a heterogeneous numerical propulsion system simulation , 1995, Proceedings of Simulation Symposium.

[25]  Joel H. Saltz,et al.  Object-Relational Queries into Multidimensional Databases with the Active Data Repository , 1999, Parallel Process. Lett..

[26]  Karsten Schwan,et al.  Active I/O streams for heterogeneous high performance computing , 1999, PARCO.

[27]  Peter A. Dinda,et al.  The Case for Prediction-Based Best-Effort Real-Time Systems , 1999, IPPS/SPDP Workshops.

[28]  David R. O'Hallaron,et al.  Earthquake ground motion modeling on parallel computers , 1996, Supercomputing '96.