Enzyme-specific profiles for genome annotation: PRIAM.

The advent of fully sequenced genomes opens the ground for the reconstruction of metabolic pathways on the basis of the identification of enzyme-coding genes. Here we describe PRIAM, a method for automated enzyme detection in a fully sequenced genome, based on the classification of enzymes in the ENZYME database. PRIAM relies on sets of position-specific scoring matrices ('profiles') automatically tailored for each ENZYME entry. Automatically generated logical rules define which of these profiles is required in order to infer the presence of the corresponding enzyme in an organism. As an example, PRIAM was applied to identify potential metabolic pathways from the complete genome of the nitrogen-fixing bacterium Sinorhizobium meliloti. The results of this automated method were compared with the original genome annotation and visualised on KEGG graphs in order to facilitate the interpretation of metabolic pathways and to highlight potentially missing enzymes.

[1]  L. L. Lloyd,et al.  Enzyme nomenclature — Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology: Academic Press Ltd, London, UK, 1992. xiii + 862 pp. Price £40.00. ISBN 0-12-227165-3 , 1994 .

[2]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[3]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[4]  Peer Bork,et al.  Exploring the Mycoplasma capricolum genome: a minimal cell reveals its physiology , 1995, Molecular microbiology.

[5]  H. Hilbert,et al.  Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. , 1996, Nucleic acids research.

[6]  P. Bork,et al.  Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli , 1996, Current Biology.

[7]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[8]  Jérôme Gouzy,et al.  XDOM, a graphical tool to analyse domain arrangements in any set of protein sequences , 1997, Comput. Appl. Biosci..

[9]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[10]  M. Kanehisa,et al.  Reconstruction of amino acid biosynthesis pathways from the complete genome sequence. , 1998, Genome research.

[11]  R. Voegele,et al.  Characterization of two members of a novel malic enzyme class. , 1999, Biochimica et Biophysica Acta.

[12]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[13]  Jérôme Gouzy,et al.  Whole Genome Protein Domain Analysis using a New Method for Domain Clustering , 1999, Comput. Chem..

[14]  Natalia Maltsev,et al.  WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction , 2000, Nucleic Acids Res..

[15]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[16]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[17]  M. Hattori,et al.  Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS , 2000, Nature.

[18]  Lynda B. M. Ellis,et al.  The University of Minnesota Biocatalysis/Biodegradation Database: emphasizing enzymes , 2001, Nucleic Acids Res..

[19]  A. Goffeau,et al.  Analysis of the chromosome sequence of the legume symbiont Sinorhizobium meliloti strain 1021 , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Ronald W. Davis,et al.  The Composite Genome of the Legume Symbiont Sinorhizobium meliloti , 2001, Science.

[21]  Kim Wong,et al.  The complete sequence of the 1,683-kb pSymB megaplasmid from the N2-fixing endosymbiont Sinorhizobium meliloti , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  J. O'Brien,et al.  Genetic Organization of the Region Encoding Regulation, Biosynthesis, and Transport of Rhizobactin 1021, a Siderophore Produced by Sinorhizobium meliloti , 2001, Journal of bacteriology.

[23]  R. W. Davis,et al.  Nucleotide sequence and predicted functions of the entire Sinorhizobium meliloti pSymA megaplasmid , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Antje Chang,et al.  BRENDA, enzyme data and metabolic information , 2002, Nucleic Acids Res..

[25]  J. Weissenbach,et al.  Genome sequence of the plant pathogen Ralstonia solanacearum , 2002, Nature.

[26]  Peter D. Karp,et al.  The MetaCyc Database , 2002, Nucleic Acids Res..

[27]  Jérôme Gouzy,et al.  ProDom: Automated Clustering of Homologous Domains , 2002, Briefings Bioinform..

[28]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..

[29]  Peter D. Karp,et al.  The EcoCyc Database , 2002, Nucleic Acids Res..

[30]  Tim J. P. Hubbard,et al.  SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..

[31]  Anne-Lise Veuthey,et al.  Automated annotation of microbial proteomes in SWISS-PROT , 2003, Comput. Biol. Chem..

[32]  John B. Anderson,et al.  CDD: a curated Entrez database of conserved domain alignments , 2003, Nucleic Acids Res..

[33]  G. Weiller,et al.  A global analysis of protein expression profiles in Sinorhizobium meliloti: discovery of new genes for nodule occupancy and stress adaptation. , 2003, Molecular plant-microbe interactions : MPMI.