Potential structural motifs for reverse transcriptases.

Recently, Xiong and Eickbush ( 1988) generated a detailed primary sequence alignment of RNA-directed DNA polymerases ( rd’s) . The rd’s, or reverse transcriptases, are essential enzymes of retroviruses, and recent evidence for reverse transcription has been obtained for other genetic elements, including DNA viruses, transposable elements, and introns (see refs. in Xiong and Eickbush 1988). The Xiong and Eickbush study extends previous work, bringing into a single alignment the rd sequences of these distant elements. Their primary purpose was to reconstruct the potential evolutionary relationship among these sequences. We independently generated a similar alignment for the distantly related viral and LTR-retrotransposon rd’s in order to model the potentially conserved structural motif(s). Our approach and result can be viewed as a logical extension of Xiong and Eickbush’s work. The combining of sequence similarity with predicted structure to optimize alignments has been reported elsewhere (Nishikawa and Ooi 1986; Webster et al. 1987). Our alignment of the distant rd’s was generated in two steps. First, pairwise sequence comparisons of representative proteins of major clusters were carried out using a modification of the dynamic programming algorithm (Smith and Waterman 198 1). This algorithm generates optimal alignments from primary sequence information annotated with neighborhood structural information. Second, the initial sequence alignments, with their predicted secondary-structure annotation, were used to construct complex pattern descriptors for these sequences (for a general discussion of our method for pattern descriptor construction, see Webster et al. 1987, 1988). All descriptor matches within the rd sequences and to the entire NBRF/PIR (George et al. 1986) version 16 data base were identified. Finally, for each descriptor, the sensitivities (% rd’s that match the descriptor) and specificities (% non-rd’s in the NBRF/ PIR version 16 data base that do not match the descriptor) were estimated. On the basis of this analysis, there appear to be at least four common blocks of structural similarity among distant rd sequences (fig. 1). These blocks are labeled 25 (following the nomenclature of Xiong and Eickbush). They are displayed relative to the sequence of HIV1 (Sanchez-Pescador et al. 1985) in figure 1. Block 5 is the most strongly conserved and has elsewhere been characterized as a DD dipeptide followed by several hydrophobic residues (reviewed in Baltimore 1985). Our optimized pattern for seven amino acids of this region of block 5 is [WYF][ ILVMWYFC] DD [ ILV] [ ILVMWYFC ] [ ILVMWYFC 1, with predicted beta strands flanking the DD dipeptide within four amino acids (see legend to fig. 1). This descriptor has an estimated 100% sensitivity and 100% specificity for the rd’s. Given the shortness of the loop, this region is likely to forrn a beta hairpin (Milner-White and Poet 1987; Argos 1988). A second pattern descriptor (see legend to fig. 1) characterizes blocks 2-4 with a sensitivity of 80% and a specificity of 99.9%. Inclusion of the secondary-structure elements in the descriptor increases its sensitivity and specificity correlation coefficient by 23%. Alignment of the regions of the rd’s (listed in the legend to fig. 1) that match the two

[1]  Brendan A. Larder,et al.  Site-specific mutagenesis of AIDS virus reverse transcriptase , 1987, Nature.

[2]  Kathryn E. Sidman,et al.  The protein identification resource (PIR). , 1986, Nucleic acids research.

[3]  G. Gerard,et al.  Substrate binding domain of murine leukemia virus reverse transcriptase. Identification of lysine 103 and lysine 421 as binding site residues. , 1988, The Journal of biological chemistry.

[4]  R H Lathrop,et al.  Pattern descriptors and the unidentified reading frame 6 human mtDNA dinucleotide‐binding site , 1988, Proteins.

[5]  J. Sun,et al.  Reverse transcriptase in a clinical strain of Escherichia coli: production of branched RNA-linked msDNA. , 1989, Science.

[6]  Ron Poet,et al.  Loops, bulges, turns and hairpins in proteins , 1987 .

[7]  P Argos,et al.  A sequence motif in many polymerases. , 1988, Nucleic acids research.

[8]  R H Lathrop,et al.  Prediction of a common structural domain in aminoacyl-tRNA synthetases through use of a new pattern-directed inference system. , 1987, Biochemistry.

[9]  D. Baltimore Retroviruses and Retrotransposons: The role of reverse transcription in shaping the eukaryotic genome , 1985, Cell.

[10]  K. Steimer,et al.  Nucleotide sequence and expression of an AIDS-associated retrovirus (ARV-2). , 1985, Science.

[11]  R. M. Abarbanel,et al.  Turn prediction in proteins using a pattern-matching approach. , 1986, Biochemistry.

[12]  M. A. McClure,et al.  Computer analysis of retroviral pol genes: assignment of enzymatic functions to specific sequences and homologies with nonviral enzymes. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Teresa A. Webster,et al.  A modified Chou and Fasman protein structure algorithm , 1987, Comput. Appl. Biosci..

[14]  T. Eickbush,et al.  Similarity of reverse transcriptase-like sequences of viruses, transposable elements, and mitochondrial introns. , 1988, Molecular biology and evolution.

[15]  P. Argos,et al.  Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial viruses. , 1984, Nucleic acids research.

[16]  K Nishikawa,et al.  Amino acid sequence homology applied to the prediction of protein secondary structures, and joint prediction with existing methods. , 1986, Biochimica et biophysica acta.

[17]  J. Wooley,et al.  Set of novel, conserved proteins fold pre‐messenger RNA into ribonucleosomes , 1986, Proteins.

[18]  Richard H. Lathrop,et al.  ARIADNE: pattern-directed inference and hierarchical abstraction in protein structure recognition , 1987, CACM.