A multi-pass algorithm for sorting extremely large data files

An extremely large data file is a file that is greater than the size of the main memory by multiple orders of magnitude. Sorting such a file involves external sorting algorithm, which uses both the hard disk and the main memory to accomplish the sorting task. Since the hard disk is much slower than the main memory, the number of hard disk input/output operations is considered the main performance metric. The new proposed method decreases the total number of input/output operations; hence, it reduces the total time of sorting. The proposed method has less number of disk read/write operations than currently existing approaches. The input/output complexity of the proposed algorithm is analyzed and compared with other algorithms. The proposed algorithm uses a constant merging order at the merge phase of the external sort with multiple passes over each set of data. It is shown that the proposed algorithm has lower sort time requirements than previous approaches.