Control Versus Data Flow in Parallel Database Machines

The execution of a query in a parallel database machine can be controlled in either a control flow way, or in a data flow way. In the former case a single system node controls the entire query execution. In the latter case the processes that execute the query, although possibly running on different nodes of the system, trigger each other. Lately, many database research projects focus on data flow control since it should enhance response times and throughput. The authors study control versus data flow with regard to controlling the execution of database queries. An analytical model is used to compare control and data flow in order to gain insights into the question which mechanism is better under which circumstances. Also, some systems using data flow techniques are described, and the authors investigate to which degree they are really data flow. The results show that for particular types of queries data flow is very attractive, since it reduces the number of control messages and balances these messages over the nodes. >

[1]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[2]  David J. DeWitt,et al.  Parallel database systems: the future of database processing or a passing fad? , 1990, SGMD.

[3]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[4]  Paul W. P. J. Grefen,et al.  PRISMA/DB: A Parallel Main Memory Relational DBMS , 1992, IEEE Trans. Knowl. Data Eng..

[5]  David J. DeWitt,et al.  GAMMA - A High Performance Dataflow Database Machine , 1986, VLDB.

[6]  Michael Stonebraker,et al.  Parallel Database Systems , 1990, Lecture Notes in Computer Science.

[7]  David J. DeWitt,et al.  A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment , 1989, SIGMOD '89.

[8]  Richard P. Hopkins,et al.  Data-Driven and Demand-Driven Computer Architecture , 1982, CSUR.

[9]  William Alexander,et al.  Process and dataflow control in distributed data-intensive systems , 1988, SIGMOD '88.

[10]  Hector Garcia-Molina,et al.  Disk striping , 1986, 1986 IEEE Second International Conference on Data Engineering.

[11]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[12]  Robbert van Renesse,et al.  Experiences with the Amoeba distributed operating system , 1990, CACM.

[13]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[14]  Patrick Valduriez,et al.  Prototyping Bubba, A Highly Parallel Database System , 1990, IEEE Trans. Knowl. Data Eng..

[15]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.