Abstract External sorting is a frequent operation in relational database systems, sometimes as a step in important operations such as joins. Therefore, external sorting on a parallel system is a key index of system performance for database applications. However, the problem of external sorting on multicomputers is not as well understood as parallel internal sorting, when keys reside in main memory. In many cases, analysis is performed under assumptions such as unlimited resources (number of processors, amount of memory, network bandwidth) and full overlapped use of resources, limiting its applicability in practice. External sorting typically involves the creation of multiple sorted runs (Step 1) followed by merging of the sorted runs (Step 2). In this paper, we present an analytical model for Step 1 using pipelined sort on a message-passing multicomputer with shared disks. This model includes parameters representing system configuration, performance of system components, software-related choices, and problem size. The execution time predicted by the model is compared with experimental results on a transputer-based system reported recently by the authors [5]. Based on the model, impact of system scale-up and faster components is investigated. The model is general enough for use in benchmarking other message-based machines.
[1]
A. Inkeri Verkamo,et al.
Performance comparison of distributive and mergesort as external sorting algorithms
,
1989,
J. Syst. Softw..
[2]
Peter J. Varman,et al.
Merging Multiple Lists on Hierarchical-Memory Multiprocessors
,
1991,
J. Parallel Distributed Comput..
[3]
Wouter Joosen,et al.
Evaluating Communication Overhead in Helios
,
1990
.
[4]
Ferng-Ching Lin,et al.
Optimal Parallel External Merging under Hardware Constraints
,
1991,
ICPP.
[5]
Ivan Luiz Marques Ricarte,et al.
External sorting on a reconfigurable message-passing multicomputer: experimental results and analysis
,
1992,
[1992] Proceedings of the 35th Midwest Symposium on Circuits and Systems.
[6]
Wo-Shun Luk,et al.
An Analytic/Empirical Study of Distributed Sorting on a Local Area Network
,
1989,
IEEE Trans. Software Eng..
[7]
Hamid Pirahesh,et al.
Parallelism in relational data base systems: architectural issues and design approaches
,
1990,
DPDS '90.