Ancient conserved regions in new gene sequences and the protein databases.

Sets of new gene sequences from human, nematode, and yeast were compared with each other and with a set of Escherichia coli genes in order to detect ancient evolutionarily conserved regions (ACRs) in the encoded proteins. Nearly all of the ACRs so identified were found to be homologous to sequences in the protein databases. This suggests that currently known proteins may already include representatives of most ACRs and that new sequences not similar to any database sequence are unlikely to contain ACRs. Preliminary analyses indicate that moderately expressed genes may be more likely to contain ACRs than rarely expressed genes. It is estimated that there are fewer than 900 ACRs in all.

[1]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[2]  W Gilbert,et al.  Genes-in-pieces revisited. , 1985, Science.

[3]  J. Berg,et al.  Potential metal-binding domains in nucleic acid binding proteins. , 1986, Science.

[4]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[5]  J. Sutcliffe,et al.  mRNA in the mammalian central nervous system. , 1988, Annual review of neuroscience.

[6]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[7]  R. Kriz,et al.  Sequence similarity of phospholipase C with the non-catalytic region of src , 1988, Nature.

[8]  R. Roberts,et al.  Sequence motifs specific for cytosine methyltransferases. , 1988, Gene.

[9]  S F Altschul,et al.  Protein database searches for multiple alignments. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[10]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[11]  W. Gilbert,et al.  How big is the universe of exons? , 1990, Science.

[12]  Eugene W. Myers,et al.  Basic local alignment search tool. Journal of Molecular Biology , 1990 .

[13]  M. Sternberg,et al.  Flexible protein sequence patterns. A sensitive method to detect weak structural similarities. , 1990, Journal of molecular biology.

[14]  W. Pearson Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.

[15]  Serge A. Hazout,et al.  'Size leap' algorithm: an efficient extraction of the longest common motifs from a molecular sequence set. Application to the DNA sequence reconstruction , 1991, Comput. Appl. Biosci..

[16]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[17]  S. Henikoff,et al.  Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[18]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[19]  A. Knoll,et al.  The early evolution of eukaryotes: a geological perspective. , 1992, Science.

[20]  cDNA sequencing: a report from the worm front , 1992, Nature Genetics.

[21]  B. Dujon,et al.  The complete DNA sequence of yeast chromosome III , 1992, Nature.

[22]  R. Staden,et al.  The C. elegans genome sequencing project: a beginning , 1992, Nature.

[23]  J. Craig Venter,et al.  Sequence identification of 2,375 human brain genes , 1992, Nature.