Analysis of Pipelined External Sorting on a Reconfigurable Message-Passing Multicomputer

Abstract External sorting is a frequent operation in relational database systems, sometimes as a step in important operations such as joins. Therefore, external sorting on a parallel system is a key index of system performance for database applications. However, the problem of external sorting on multicomputers is not as well understood as parallel internal sorting, when keys reside in main memory. In many cases, analysis is performed under assumptions such as unlimited resources (number of processors, amount of memory, network bandwidth) and full overlapped use of resources, limiting its applicability in practice. External sorting typically involves the creation of multiple sorted runs (Step 1) followed by merging of the sorted runs (Step 2). In this paper, we present an analytical model for Step 1 using pipelined sort on a message-passing multicomputer with shared disks. This model includes parameters representing system configuration, performance of system components, software-related choices, and problem size. The execution time predicted by the model is compared with experimental results on a transputer-based system reported recently by the authors [5]. Based on the model, impact of system scale-up and faster components is investigated. The model is general enough for use in benchmarking other message-based machines.