An Empirical Evaluation of Consensus Rules for Molecular Sequences

We investigate relationships among several consensus methods for molecular sequences: c(P*), the containing subset method of Gordon (1993); gp, the generalized plurality rule method of Day and McMorris (1992a); and sp, a method based on the simple plurality rule. When presented with a profile of k symbols, each method returns a consensus result for that profile. Since we are particularly interested in comparing c(P)* with gp, we use c(P*) to define cg, a consensus method which is equal to c(P)* where P* is chosen at each k to minimize the number of inputs for which c(P*) and gp return different results. When k is such that 4 < k < 50, the results returned by cg and gp agree on at least 68% of the distinct inputs. Since the mathematical bases of c(P*) and gp are fundamentally different, the extent to which cg and gp agree is perplexing.