The complement of enzymatic sets in different species.

We present here a comprehensive analysis of the complement of enzymes in a large variety of species. As enzymes are a relatively conserved group there are several classification systems available that are common to all species and link a protein sequence to an enzymatic function. Enzymes are therefore an ideal functional group to study the relationship between sequence expansion, functional divergence and phenotypic changes. By using information retrieved from the well annotated SWISS-PROT database together with sequence information from a variety of fully sequenced genomes and information from the EC functional scheme we have aimed here to estimate the fraction of enzymes in genomes, to determine the extent of their functional redundancy in different domains of life and to identify functional innovations and lineage specific expansions in the metazoa lineage. We found that prokaryote and eukaryote species differ both in the fraction of enzymes in their genomes and in the pattern of expansion of their enzymatic sets. We observe an increase in functional redundancy accompanying an increase in species complexity. A quantitative assessment was performed in order to determine the degree of functional redundancy in different species. Finally, we report a massive expansion in the number of mammalian enzymes involved in signalling and degradation.

[1]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[2]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[3]  N. Moran,et al.  Tracing the evolution of gene loss in obligate bacterial symbionts. , 2003, Current opinion in microbiology.

[4]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[5]  Daniel W. A. Buchan,et al.  Evolution of protein superfamilies and bacterial genome size. , 2004, Journal of molecular biology.

[6]  M A Andrade,et al.  Bioinformatics: from genome data to biological knowledge. , 1997, Current opinion in biotechnology.

[7]  Carlos López-Otín,et al.  A genomic analysis of rat proteases and protease inhibitors. , 2004, Genome research.

[8]  C. Ouzounis,et al.  Transcription regulation and environmental adaptation in bacteria. , 2003, Trends in microbiology.

[9]  D. Krakauer,et al.  Redundancy, antiredundancy, and the robustness of genomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  N. Moran,et al.  Microbial Minimalism Genome Reduction in Bacterial Pathogens , 2002, Cell.

[11]  C. Sander,et al.  Functional Classes in the Three Domains of Life , 1999, Journal of Molecular Evolution.

[12]  Anton J. Enright,et al.  Protein families and TRIBES in genome sequence space. , 2003, Nucleic acids research.

[13]  J. S. Parkinson,et al.  Coupling the phosphotransferase system and the methyl-accepting chemotaxis protein-dependent chemotaxis signaling pathways of Escherichia coli. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Sophia Tsoka,et al.  The phylogenetic extent of metabolic enzymes and pathways. , 2003, Genome research.

[15]  D. Nelson,et al.  Cytochrome P450 and the individuality of species. , 1999, Archives of biochemistry and biophysics.

[16]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[17]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[18]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..

[19]  R. Doolittle,et al.  Determining Divergence Times of the Major Kingdoms of Living Organisms with a Protein Clock , 1996, Science.

[20]  M. Lynch,et al.  The Origins of Genome Complexity , 2003, Science.

[21]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[22]  A. Barrett [1] Classification of peptidases , 1994 .

[23]  M. Gerstein,et al.  Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-Domain Proteins , 2001, Genome Research.

[24]  L. Aravind,et al.  Origin of multicellular eukaryotes - insights from proteome comparisons. , 1999, Current opinion in genetics & development.

[25]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[26]  Stuart Moodie,et al.  Application of high-throughput computing in bioinformatics , 2002, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[27]  Janet M. Thornton,et al.  Microeconomic Principles Explain an Optimal Genome Size in Bacteria , 2004, Spanish Bioinformatics Conference.

[28]  Simon C. Potter,et al.  An overview of Ensembl. , 2004, Genome research.

[29]  Enrique Querol,et al.  Analysis of phenetic trees based on metabolic capabilites across the three domains of life. , 2004, Journal of molecular biology.

[30]  Keith F. Tipton,et al.  History of the enzyme nomenclature system , 2000, Bioinform..

[31]  E. Nimwegen Scaling Laws in the Functional Content of Genomes , 2003, physics/0307001.

[32]  M. Sternberg,et al.  Structural characterization of the human proteome. , 2002, Genome research.

[33]  Natalia Maltsev,et al.  WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction , 2000, Nucleic Acids Res..

[34]  Peter D. Karp,et al.  The EcoCyc and MetaCyc databases , 2000, Nucleic Acids Res..

[35]  C Ouzounis,et al.  Genomes with distinct function composition , 1996, FEBS letters.

[36]  Temple F. Smith,et al.  Comparison of the complete protein sets of worm and yeast: orthology and divergence. , 1998, Science.

[37]  Janet Hemingway,et al.  Evolution of Supergene Families Associated with Insecticide Resistance , 2002, Science.

[38]  M. Wigler,et al.  A family of human phosphodiesterases homologous to the dunce learning and memory gene product of Drosophila melanogaster are potential targets for antidepressant drugs , 1993, Molecular and cellular biology.

[39]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..