Analysis of conserved domains and sequence motifs in cellular regulatory proteins and locus control regions using new software tools for multiple alignment and visualization.

With the tremendous expansion of molecular sequence data in recent years, multiple alignment is arguably one of the two most important analytic techniques (the other being fast database searching). A number of useful approaches to this problem have previously been developed, but often they are limited to only a subset of multiple-alignment applications and cannot easily deal with the complex structural organization seen in an increasing number of sequences. For example, a single sequence may contain several domains of different evolutionary origins, and the multiplicities and relative ordering of these domains may be quite different among related sequences. Here we describe an integrated set of interactive Unix tools that combines several multiple-alignment techniques with traditional "dot-plot" visualization to provide a flexible environment for approaching complex sequence analysis problems. We apply these tools to the identification and characterization of "catalytic" domains in ras and rho/rac GTPase-activating proteins, to "Src homology" (SH2, SH3) domains in cytoplasmic signaling proteins, to repetitive sequence motifs in the alpha and beta subunits of protein prenyltransferases, and to regulatory DNA sequences in the locus control region of the beta-globin gene cluster.