Likelihood vs. Information in Aligning Biopolymer Sequences

Biopolymer sequences often contain regions of similarity with other sequences due to homology or common function. A common method of discovering patterns in biopolymer sequences is to align a set of sequences so that certain columns of the alignment have highly non-random residue frequency distributions. The pattern can then be described in terms of a consensus pattern, motif, proole, speci-city matrix or regular expression. This research note shows that a commonly used method of measuring the \goodness" of an alignment based on information theory is actually equivalent to maximizing the likelihood ratio of two hypotheses when the assumed probability distribution is multinomial. In addition, a method which has been used by other workers for determining whether a new sequence contains the pattern is shown to be essentially equivalent to a likelihood ratio. This ooers a new, uniform way of thinking about the information contained in a set of aligned sequences which is more intuitive, and may aid the development of improved algorithms.