An investigation of multiprocessor structures and algorithms for data base management

Rapid advances in semiconductor and disk technologies over the last few years have dramatically altered the role of computers in society. In particular, the use of computers for the storage, control, and dissemination of data has become a major factor in our modern economy. Traditionally, computers have not been designed with data base applications in mind, and only in recent years has a serious effort been made to design computing hardware specifically for data base applications. Though some data base applications are performed well with conventional machines, a large class of problems are still prohibitively expensive. Through the use of parallelism, it is argued that greater performance improvements can be obtained for this class of problems than for the more traditional class. The use of multiple processors closely cooperating to solve a single problem in the data base environment is studied. A survey is made of other designs, including the Intel study of the Relational Associative Processor (RAP), originally proposed at the University of Toronto. A structured approach is taken to develop a set of principles by which a data base machine can be specified and designed. First the technological capabilities and constraints for the 1980's are examined. Then a study is made of data base operations and the most critical ones are identified. Next one of these operations is examined to determine how it can be implemented on a variety of processor topologies and the best topology is determined for that operation. Then algorithms are developed for implementing the most critical relational data base operation: join. A number of new algorithms are developed for implementing the join operation, utilizing hashing methods, and a number of well-known, conventional algorithms are extended to the multiple processor environment. The cost of these methods is determined in terms of disk accesses and in terms of communication among the processors, and the algorithms are compared under a variety of circumstances. The hashing algorithms are shown to be superior to the more conventional algorithms, and are shown to be limited only by the bandwidth of the disk. The implications on the processor architecture are examined, and features are identified to support the hashing methods well. No exotic technologies are proposed. The approach is shown to be viable for the specification of the optimal data base machine using the technology of the 1980's.