A General Lower Bound on the I/O-Complexity of Comparison-based Algorithms

We show a general relationship between the number of comparisons and the number of I/O-operations needed to solve a given problem. This relationship enables one to show lower bounds on the number of I/O-operations needed to solve a problem whenever a lower bound on the number of comparisons is known. We use the result to show lower bounds on the I/O-complexity on a number of problems where known techniques only give trivial bounds. Among these are the problems of removing duplicates from a multiset, a problem of great importance in e.g. relational data-base systems, and the problem of determining the mode — the most frequently occurring element — of a multiset. We develop algorithms for these problems in order to show that the lower bounds are tight.