Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs.

Positional cloning has already produced the sequences of more than 70 human genes associated with specific diseases. In addition to their medical importance, these genes are of interest as a set of human genes isolated solely on the basis of the phenotypic effect of the respective mutations. We analyzed the protein sequences encoded by the positionally cloned disease genes using an iterative strategy combining several sensitive computer methods. Comparisons to complete sequence databases and to separate databases of nematode, yeast, and bacterial proteins showed that for most of the disease gene products, statistically significant sequence similarities are detectable in each of the model organisms. Only the nematode genome encodes apparent orthologs with conserved domain architecture for the majority of the disease genes. In yeast and bacterial homologs, domain organization is typically not conserved, and sequence similarity is limited to individual domains. Generally, human genes complement mutations only in orthologous yeast genes. Most of the positionally cloned genes encode large proteins with several globular and nonglobular domains, the functions of some or all of which are not known. We detected conserved domains and motifs not described previously in a number of proteins encoded by disease genes and predicted functions for some of them. These predictions include an ATP-binding domain in the product of hereditary nonpolyposis colon cancer gene (a MutL homolog), which is conserved in the HS90 family of chaperone proteins, type II DNA topoisomerases, and histidine kinases, and a nuclease domain homologous to bacterial RNase D and the 3'-5' exonuclease domain of DNA polymerase I in the Werner syndrome gene product.

[1]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[2]  A. Bateman The structure of a domain common to archaebacteria and the homocystinuria disease protein. , 1997, Trends in biochemical sciences.

[3]  E V Koonin,et al.  Complete genome sequences of cellular life forms: glimpses of theoretical evolutionary genomics. , 1996, Current opinion in genetics & development.

[4]  E V Koonin,et al.  Sequence analysis of eukaryotic developmental proteins: ancient and novel domains. , 1996, Genetics.

[5]  M. Boguski,et al.  Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences. , 1996, Genome research.

[6]  L. Guarente,et al.  Cloning the gene for Werner syndrome: a disease with many symptoms of premature aging. , 1996, Trends in genetics : TIG.

[7]  Eugene V. Koonin,et al.  …Functional motifs… , 1996, Nature Genetics.

[8]  M. Marinus,et al.  Dominant negative mutator mutations in the mutL gene of Escherichia coli , 1996, Nucleic Acids Res..

[9]  R. Kolodner,et al.  Biochemistry and genetics of eukaryotic mismatch repair. , 1996, Genes & development.

[10]  P. Bork,et al.  Protein sequence motifs. , 1996, Current opinion in structural biology.

[11]  P. Bork,et al.  Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli , 1996, Current Biology.

[12]  Douglas E. Bassett,et al.  Yeast genes and human disease , 1996, Nature.

[13]  J. Naylor,et al.  Mendelian inheritance in man: A catalog of human genes and genetic disorders , 1996 .

[14]  G. Schuler,et al.  Entrez: molecular biology database and retrieval system. , 1996, Methods in enzymology.

[15]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[16]  T. Gibson,et al.  Applying motif and profile searches. , 1996, Methods in enzymology.

[17]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[18]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its new supplement TREMBL , 1996, Nucleic Acids Res..

[19]  J. Sulston,et al.  The genome of Caenorhabditis elegans. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Thomas Madej,et al.  Threading analysis suggests that the obese gene product may be a helical cytokine , 1995, FEBS letters.

[21]  M S Boguski,et al.  Comparative genomics, genome cross-referencing and XREFdb. , 1995, Trends in genetics : TIG.

[22]  Francis S. Collins,et al.  Positional cloning moves from perditional to traditional , 1995, Nature Genetics.

[23]  B. Rost,et al.  Transmembrane helices predicted at 95% accuracy , 1995, Protein science : a publication of the Protein Society.

[24]  V. McKusick Mendelian Inheritance in Man: A Catalog of Human Genes and Genetic Disorders , 1997 .

[25]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[26]  Sean R. Eddy,et al.  Maximum Discrimination Hidden Markov Models of Sequence Consensus , 1995, J. Comput. Biol..

[27]  S. Altschul,et al.  Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[28]  M S Boguski,et al.  Genes conserved in yeast and humans. , 1994, Human molecular genetics.

[29]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[30]  S. Altschul,et al.  Issues in searching molecular sequence databases , 1994, Nature Genetics.

[31]  E. Koonin,et al.  RNase T shares conserved sequence motifs with DNA proofreading exonucleases. , 1993, Nucleic acids research.

[32]  M. Inouye,et al.  Requirement of both kinase and phosphatase activities of an Escherichia coli receptor (Taz1) for ligand-dependent signal transduction. , 1993, Journal of molecular biology.

[33]  C. Walsh,et al.  Hsp90 chaperonins possess ATPase activity and bind heat shock transcription factors and peptidyl prolyl isomerases. , 1993, The Journal of biological chemistry.

[34]  Chris Sander,et al.  Molecular modelling of the Norrie disease protein predicts a cystine knot growth factor tertiary structure , 1993, Nature Genetics.

[35]  M. Blüthner,et al.  Cloning and characterization of the cDNA coding for a polymyositis- scleroderma overlap syndrome-related nucleolar 100-kD protein , 1992, The Journal of experimental medicine.

[36]  E. Lin,et al.  Purification and phosphorylation of the Arc regulatory components of Escherichia coli , 1992, Journal of bacteriology.

[37]  D. Wigley,et al.  Crystal structure of an N-terminal fragment of the DNA gyrase B protein , 1991, Nature.

[38]  A. Lupas,et al.  Predicting coiled coils from protein sequences , 1991, Science.

[39]  M. Blasco,et al.  A general structure for DNA-dependent DNA polymerases. , 1991, Gene.

[40]  G D Schuler,et al.  A workbench for multiple alignment construction and analysis , 1991, Proteins.

[41]  T. Steitz,et al.  Structural basis for the 3′‐5′ exonuclease activity of Escherichia coli DNA polymerase I: a two metal ion mechanism. , 1991, The EMBO journal.

[42]  C. M. Joyce,et al.  The 3′‐5′ exonuclease of DNA polymerase I of Escherichia coli: contribution of each amino acid at the active site to the reaction. , 1991, The EMBO journal.

[43]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[44]  T. Steitz,et al.  Structure of large fragment of Escherichia coli DNA polymerase I complexed with dTMP , 2020, Nature.

[45]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[46]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.