Stable duplicate-key extraction with optimal time and space bounds

SummaryWe consider the problem of transforming a list L of records sorted on some key into two sublists L1 and L2 where, for each distinct key in L, L1 contains the first record of L that possesses the key and L2 contains all records of L with duplicate keys. We desire that our duplicate-key extraction algorithm perform the transformation in place and be stable (that is, records within each sublist must obey the original order given by L). This operation is useful in database and related file processing environments whenever only distinct keys need be considered. Moreover, stability in extraction insures that L can be efficiently restored at a later time with a stable merge of L1 and L2. Any procedure for performing duplicate-key extraction on a list of size n must require at least O(n) time and O(1) extra space, although the obvious algorithm for achieving either bound alone violates the other bound. We design a stable algorithm, using block-rearrangement techniques, and show that it is optimal in the theoretical sense that it achieves both lower bounds simultaneously. We also prove that its worst-case number of key comparisons and record exchanges sum to no more than 6 n, suggesting that the algorithm has practical application as well.