Applying Data Flow Techniques to Data Base Machines

Recently, data flow languages and architectures have been the subject of much research. Although most projects set out to produce a general-purpose, data flow machine, the primary goal has been to reduce the execution time of large numerical computations. In this article, we describe a different line of research: the application of data flow machine principles towards improving access to large nonnumerical data bases. For the past four years we have researched the problems associated with providing efficient access to relational data bases that are too large to be handled by a single conventional processor. The result of that research is Direct, a multiprocessor multiple instruction stream multiple data stream relational data base machine.1 A prototype of this MIMD machine that supports the relational data base system Ingres2 became operational in June 1980 using a PDP 11/40 and eight PDP 11/23's. All previous data base machines were single instruction stream multiple data stream machines and could thus execute only one data base operation at a time. One consequence of an SIMD design data base machine is that the activities of the processors do not have to be scheduled; during each time unit, all processors execute the same instruction. With an MIMD design, such as Direct, groups of processors can work on different instructions from the same query, from different queries, or both. Therefore, after the design of Direct was completed, alternative processor allocation strategies for MIMD data base machines were explored. The goal of this research was to determine what strategy would allow the greatest number of queries to be executed per time unit. One of the strategies examined was based on data flow machine principles. In this article, we describe some of the results obtained from our experiments with the various strategies,3 in particular, experiences with variations of a strategy based on data flow techniques for processor allocation.