Combining multiple structure and sequence alignments to improve sequence detection and alignment: Application to the SH2 domains of Janus kinases

In this paper, an approach is described that combines multiple structure alignments and multiple sequence alignments to generate sequence profiles for protein families. First, multiple sequence alignments are generated from sequences that are closely related to each sequence of known three-dimensional structure. These alignments then are merged through a multiple structure alignment of family members of known structure. The merged alignment is used to generate a Hidden Markov Model for the family in question. The Hidden Markov Model can be used to search for new family members or to improve alignments for distantly related family members that already have been identified. Application of a profile generated for SH2 domains indicates that the Janus family of nonreceptor protein tyrosine kinases contains SH2 domains. This conclusion is strongly supported by the results of secondary structure-prediction programs, threading calculations, and the analysis of comparative models generated for these domains. One of the Janus kinases, human TYK2, has an SH2 domain that contains a histidine instead of the conserved arginine at the key phosphotyrosine-binding position, βB5. Calculations of the pKa values of the βB5 arginines in a number of SH2 domains and of the βB5 histidine in a homology model of TYK2 suggest that this histidine is likely to be neutral around pH 7, thus indicating that it may have lost the ability to bind phosphotyrosine. If this indeed is the case, TYK2 may contain a domain with an SH2 fold that has a modified binding specificity.

[1]  A. Elofsson,et al.  Hidden Markov models that use predicted secondary structures for fold recognition , 1999, Proteins.

[2]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[3]  J. Thompson,et al.  Using CLUSTAL for multiple sequence alignments. , 1996, Methods in enzymology.

[4]  J Schultz,et al.  SMART, a simple modular architecture research tool: identification of signaling domains. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Peer Bork,et al.  SMART: identification and annotation of domains from signalling and extracellular protein sequences , 1999, Nucleic Acids Res..

[6]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[7]  Geoffrey J. Barton,et al.  JPred : a consensus secondary structure prediction server , 1999 .

[8]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[9]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[10]  A. Andres,et al.  JAK2, a third member of the JAK family of protein tyrosine kinases. , 1992, Oncogene.

[11]  A. Panchenko,et al.  Combination of threading potentials and sequence profiles improves fold recognition. , 2000, Journal of molecular biology.

[12]  M. Karplus,et al.  pKa's of ionizable groups in proteins: atomic detail from a continuum electrostatic model. , 1990, Biochemistry.

[13]  J. Burnside,et al.  Computational and functional analysis of the putative SH2 domain in Janus Kinases. , 2000, Biochemical and biophysical research communications.

[14]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[15]  D Cowburn,et al.  Modular peptide recognition domains in eukaryotic signaling. , 1997, Annual review of biophysics and biomolecular structure.

[16]  R. Abagyan,et al.  Do aligned sequences share the same fold? , 1997, Journal of molecular biology.

[17]  M. Vihinen,et al.  Six X-Linked Agammaglobulinemia-Causing Missense Mutations in the Src Homology 2 Domain of Bruton’s Tyrosine Kinase: Phosphotyrosine-Binding and Circular Dichroism Analysis1 , 2000, The Journal of Immunology.

[18]  Roger A. Sayle,et al.  DSC: public domain protein secondary structure predication , 1997, Comput. Appl. Biosci..

[19]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[20]  M. Fellous,et al.  The amino-terminal region of Tyk2 sustains the level of interferon alpha receptor 1, a component of the interferon alpha/beta receptor. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[22]  K. Bernstein,et al.  Jak2 Acts as Both a STAT1 Kinase and as a Molecular Bridge Linking STAT1 to the Angiotensin II AT1 Receptor* , 2000, The Journal of Biological Chemistry.

[23]  S. Pellegrini,et al.  A dual role for the kinase-like domain of the tyrosine kinase Tyk2 in interferon-alpha signaling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[24]  T. Gibson,et al.  Applying motif and profile searches. , 1996, Methods in enzymology.

[25]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[26]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[27]  J. Krolewski,et al.  Molecular characterization of an alpha interferon receptor 1 subunit (IFNaR1) domain required for TYK2 binding and signal transduction , 1996, Molecular and cellular biology.

[28]  P Rotkiewicz,et al.  Generalized comparative modeling (GENECOMP): A combination of sequence comparison, threading, and lattice modeling for protein structure prediction and refinement , 2001, Proteins.

[29]  J. M. Bradshaw,et al.  Calorimetric examination of high-affinity Src SH2 domain-tyrosyl phosphopeptide binding: dissection of the phosphopeptide sequence specificity and coupling energetics. , 1999, Biochemistry.

[30]  Burkhard Rost,et al.  PHD - an automatic mail server for protein secondary structure prediction , 1994, Comput. Appl. Biosci..

[31]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[32]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[33]  D Eisenberg,et al.  A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. , 1997, Journal of molecular biology.

[34]  Arieh Warshel,et al.  Consistent Calculations of pKa's of Ionizable Residues in Proteins: Semi-microscopic and Microscopic Approaches , 1997 .

[35]  B. Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments. , 2000, Journal of molecular biology.

[36]  J. Darnell,et al.  Transcriptional responses to polypeptide ligands: the JAK-STAT pathway. , 1995, Annual review of biochemistry.

[37]  K. Sharp,et al.  On the calculation of pKas in proteins , 1993, Proteins.

[38]  D. Baltimore,et al.  Point mutations in the abl SH2 domain coordinately impair phosphotyrosine binding in vitro and transforming activity in vivo , 1992, Molecular and cellular biology.

[39]  D. Eisenberg,et al.  VERIFY3D: assessment of protein models with three-dimensional profiles. , 1997, Methods in enzymology.

[40]  D. Baltimore,et al.  Crystal structure of the phosphotyrosine recognition domain SH2 of v-src complexed with tyrosine-phosphorylated peptides , 1993, Nature.

[41]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[42]  M. Gilson,et al.  The determinants of pKas in proteins. , 1996, Biochemistry.

[43]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[44]  John Kuriyan,et al.  Protein recognition: An SH2 domain in disguise , 1999, Nature.

[45]  Michael J. Eck,et al.  Structure of the amino-terminal domain of Cbl complexed to its binding site on ZAP-70 kinase , 1999, Nature.