Join and Semijoin Algorithms for a Multiprocessor Database Machine

This paper presents and analyzes algorithms for computing joins and semijoins of relations in a multiprocessor database machine. First, a model of the multiprocessor architecture is described, incorporating parameters defining I/O, CPU, and message transmission times that permit calculation of the execution times of these algorithms. Then, three join algorithms are presented and compared. It is shown that, for a given configuration, each algorithm has an application domain defined by the characteristics of the operand and result relations. Since a semijoin operator is useful for decreasing I/O and transmission times in a multiprocessor system, we present and compare two equi-semijoin algorithms and one non-equi-semijoin algorithm. The execution times of these algorithms are generally linearly proportional to the size of the operand and result relations, and inversely proportional to the number of processors. We then compare a method which consists of joining two relations to a method whereby one joins their semijoins. Finally, it is shown that the latter method, using semijoins, is generally better. The various algorithms presented are implemented in the SABRE database system; an evaluation model selects the best algorithm for performing a join according to the results presented here. A first version of the SABRE system is currently operational at INRIA.

[1]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[2]  Leo R. Gotlieb Computing joins of relations , 1975, SIGMOD '75.

[3]  Michael Stonebraker,et al.  The design and implementation of INGRES , 1976, TODS.

[4]  Performance evaluation of a relational associative processor , 1977, TODS.

[5]  M. W. Blasgen,et al.  Storage and Access in Relational Data Bases , 1977, IBM Syst. J..

[6]  Jayanta Banerjee,et al.  Concepts and capabilities of a database computer\ , 1978, TODS.

[7]  Franco P. Preparata,et al.  New Parallel-Sorting Schemes , 1978, IEEE Transactions on Computers.

[8]  David J. DeWitt,et al.  Query execution in DIRECT , 1979, SIGMOD '79.

[9]  Edward Babb,et al.  Implementing a relational database by means of specialzed hardware , 1979, TODS.

[10]  Jean Le Bihan,et al.  SIRIUS: A French Nationwide Project on Distributed Data Bases , 1980, VLDB.

[11]  Benjamin W. Wah,et al.  DIALOG: a distributed processor organization for database machine , 1980, AFIPS '80.

[12]  Jean Rohmer Machines et langages pour traiter les ensembles de données (Textes, tableaux, fichiers) , 1980 .

[13]  Michel Scholl,et al.  Design of a backend processor for a data base machine , 1980, SIGMOD '80.

[14]  David J. DeWitt,et al.  Design considerations for data-flow database machines , 1980, SIGMOD '80.

[15]  Eugene Wong,et al.  Introduction to a system for distributed databases (SDD-1) , 1980, TODS.

[16]  Yu-Chi Ho,et al.  A methodology for interpreting tree queries into optimal semi-join expressions , 1980, SIGMOD '80.

[17]  Donald D. Chamberlin,et al.  A History of System R and SQL/Data System (Invited Paper) , 1981, VLDB.

[18]  Kjell Karlsson Reduced Cover-Trees and their Application in the Sabre Access Path Model , 1981, VLDB.

[19]  Philip A. Bernstein,et al.  Using Semi-Joins to Solve Relational Queries , 1981, JACM.