Design, analysis, and implementation of parallel external sorting algorithms

In recent years, parallel sorting has been an active field of research. However, parallel sorting algorithms developed up to date cannot be used to sort a large file, because they are internal sorting algorithms. Moreover, it is not possible to implement the fastest among these algorithms with current technology. This thesis investigates the topic of parallel external sorting. Several new algorithms are presented and analyzed, using a comprehensive cost model that includes computation, communication, and I/O. The I/O cost factor is especially critical for external sorting algorithms. While an extensive literature exists that addresses computation and communication issues in parallel processing, the impact of I/O on the performance of parallel algorithms has not received adequate consideration. We propose several criteria that can measure this impact, when the mass-storage device is a parallel read/write disk. As a major application of parallel external sorting, we consider the execution of complex database operations. In particular, we propose to use a modified merge-sort as a method for eliminating duplicate records in a large file. A combinatorial model is developed to provide an accurate estimate for the cost of the duplicate elimination operation (both in the serial and the parallel cases).