SMART, a simple modular architecture research tool: identification of signaling domains.

Accurate multiple alignments of 86 domains that occur in signaling proteins have been constructed and used to provide a Web-based tool (SMART: simple modular architecture research tool) that allows rapid identification and annotation of signaling domain sequences. The majority of signaling proteins are multidomain in character with a considerable variety of domain combinations known. Comparison with established databases showed that 25% of our domain set could not be deduced from SwissProt and 41% could not be annotated by Pfam. SMART is able to determine the modular architectures of single sequences or genomes; application to the entire yeast genome revealed that at least 6.7% of its genes contain one or more signaling domains, approximately 350 greater than previously annotated. The process of constructing SMART predicted (i) novel domain homologues in unexpected locations such as band 4.1-homologous domains in focal adhesion kinases; (ii) previously unknown domain families, including a citron-homology domain; (iii) putative functions of domain families after identification of additional family members, for example, a ubiquitin-binding role for ubiquitin-associated domains (UBA); (iv) cellular roles for proteins, such predicted DEATH domains in netrin receptors further implicating these molecules in axonal guidance; (v) signaling domains in known disease genes such as SPRY domains in both marenostrin/pyrin and Midline 1; (vi) domains in unexpected phylogenetic contexts such as diacylglycerol kinase homologues in yeast and bacteria; and (vii) likely protein misclassifications exemplified by a predicted pleckstrin homology domain in a Candida albicans protein, previously described as an integrin.

[1]  W. Pearson Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.

[2]  M. McClellan,et al.  Cloning and expression of a gene encoding an integrin-like protein in Candida albicans. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Shmuel Pietrokovski,et al.  Recent enhancements to the Blocks Database servers , 1997, Nucleic Acids Res..

[4]  A. Kimchi,et al.  The death domain: a module shared by proteins with diverse cellular functions. , 1995, Trends in biochemical sciences.

[5]  J. Parsons,et al.  A mechanism for regulation of the adhesion-associated protein tyrosine kinase pp125FAK , 1996, Nature.

[6]  T J Gibson,et al.  PH domain: the first anniversary. , 1994, Trends in biochemical sciences.

[7]  R. Quatrano Genomics , 1998, Plant Cell.

[8]  P Bork,et al.  SPRY domains in ryanodine receptors (Ca(2+)-release channels). , 1997, Trends in biochemical sciences.

[9]  S. Lo,et al.  Presence of an SH2 domain in the actin-binding protein tensin. , 1991, Science.

[10]  C. Ponting,et al.  Pleckstrin's repeat performance: a novel domain in G-protein signaling? , 1996, Trends in biochemical sciences.

[11]  Mark S. Boguski,et al.  Proteins regulating Ras and its relatives , 1993, Nature.

[12]  Sean R. Eddy,et al.  Maximum Discrimination Hidden Markov Models of Sequence Consensus , 1995, J. Comput. Biol..

[13]  P Bork,et al.  Cytoplasmic signalling domains: the next generation. , 1997, Trends in biochemical sciences.

[14]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[15]  G D Schuler,et al.  A workbench for multiple alignment construction and analysis , 1991, Proteins.

[16]  P Bork,et al.  Structure and distribution of modules in extracellular proteins , 1996, Quarterly Reviews of Biophysics.

[17]  T. Gibson,et al.  Applying motif and profile searches. , 1996, Methods in enzymology.

[18]  K Tanaka,et al.  Rom1p and Rom2p are GDP/GTP exchange proteins (GEPs) for the Rho1p small GTP binding protein in Saccharomyces cerevisiae. , 1996, The EMBO journal.

[19]  F. Collins,et al.  Ancient Missense Mutations in a New Member of the RoRet Gene Family Are Likely to Cause Familial Mediterranean Fever , 1997, Cell.

[20]  P Bork,et al.  Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Shuh Narumiya,et al.  A novel partner for the GTP‐bound forms of rho and rac , 1995, FEBS letters.

[22]  R. Buchsbaum,et al.  The N-terminal pleckstrin, coiled-coil, and IQ domains of the exchange factor Ras-GRF act cooperatively to facilitate activation by calcium , 1996, Molecular and cellular biology.

[23]  Stefan A. Przyborski,et al.  The mouse rostral cerebellar malformation gene encodes an UNC-5-like protein , 1997, Nature.

[24]  C. Damsky,et al.  Focal adhesion kinase: at the crossroads of signal transduction. , 1997, Journal of cell science.

[25]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[26]  C P Ponting,et al.  The N‐terminal domains of tensin and auxilin are phosphatase homologues , 1996, Protein science : a publication of the Protein Society.

[27]  J. Parsons,et al.  Focal adhesion kinase and paxillin bind to peptides mimicking beta integrin cytoplasmic domains , 1995, The Journal of cell biology.

[28]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[29]  S. Altschul,et al.  Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[30]  George J. Feldman,et al.  Opitz G/BBB syndrome, a defect of midline development, is due to mutations in a new RING finger gene on Xp22 , 1997, Nature Genetics.

[31]  A. Lupas,et al.  Predicting coiled coils from protein sequences , 1991, Science.

[32]  J. Strominger,et al.  p62, a Phosphotyrosine-independent Ligand of the SH2 Domain of p56lck, Belongs to a New Class of Ubiquitin-binding Proteins* , 1996, The Journal of Biological Chemistry.

[33]  P. Bucher,et al.  The UBA domain: a sequence motif present in multiple enzyme classes of the ubiquitination pathway. , 1996, Trends in biochemical sciences.

[34]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[35]  Burkhard Rost,et al.  PHD - an automatic mail server for protein secondary structure prediction , 1994, Comput. Appl. Biosci..

[36]  Peer Bork,et al.  A phosphotyrosine interaction domain , 1995, Cell.

[37]  V. Wheaton,et al.  An ankyrin-related gene (unc-44) is necessary for proper axonal guidance in Caenorhabditis elegans , 1995, The Journal of cell biology.

[38]  C P Ponting,et al.  A novel family of phospholipase D homologues that includes phospholipid synthases and putative endonucleases: Identification of duplicated repeats and potential active site residues , 1996, Protein science : a publication of the Protein Society.

[39]  S H Bryant,et al.  A dynamic look at structures: WWW-Entrez and the Molecular Modeling Database. , 1996, Trends in biochemical sciences.

[40]  E V Koonin,et al.  A duplicated catalytic motif in a new superfamily of phosphohydrolases and phospholipid synthases that includes poxvirus envelope proteins. , 1996, Trends in biochemical sciences.

[41]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[42]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[43]  M. Masu,et al.  Vertebrate homologues of C. elegans UNC-5 are candidate netrin receptors , 1997, Nature.

[44]  Terri K. Attwood,et al.  Novel developments with the PRINTS protein fingerprint database , 1997, Nucleic Acids Res..

[45]  J. P. Walsh,et al.  sn-1,2-Diacylglycerol kinase of Escherichia coli. Purification, reconstitution, and partial amino- and carboxyl-terminal analysis. , 1985, The Journal of biological chemistry.

[46]  Tina M. Leisner,et al.  Direct Binding of the Platelet Integrin αIIbβ3 (GPIIb-IIIa) to Talin , 1996, The Journal of Biological Chemistry.

[47]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[48]  C. Ponting,et al.  Extending the C2 domain family: C2s in PKCs δ, ϵ,η,θ, phospholipases, GAPs, and perforin , 1996, Protein science : a publication of the Protein Society.

[49]  J. Tschopp,et al.  The death domain motif found in Fas (Apo‐1) and TNF receptor is present in proteins involved in apoptosis and axonal guidance , 1995, FEBS letters.

[50]  M. Saraste,et al.  FEBS Lett , 2000 .

[51]  W A Gilbert,et al.  The prediction of transmembrane protein sequences and their conformation: an evaluation. , 1990, Trends in biochemical sciences.

[52]  T J Gibson,et al.  PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. , 1996, Nucleic acids research.

[53]  Jacques Demaille,et al.  A candidate gene for familial Mediterranean fever , 1997, Nature Genetics.

[54]  Rudolf Jaenisch,et al.  Chromosomal deletion complexes in mice by radiation of embryonic stem cells , 1997, Nature Genetics.