BrEPS: a flexible and automatic protocol to compute enzyme-specific sequence profiles for functional annotation

BackgroundModels for the simulation of metabolic networks require the accurate prediction of enzyme function. Based on a genomic sequence, enzymatic functions of gene products are today mainly predicted by sequence database searching and operon analysis. Other methods can support these techniques: We have developed an automatic method "BrEPS" that creates highly specific sequence patterns for the functional annotation of enzymes.ResultsThe enzymes in the UniprotKB are identified and their sequences compared against each other with BLAST. The enzymes are then clustered into a number of trees, where each tree node is associated with a set of EC-numbers. The enzyme sequences in the tree nodes are aligned with ClustalW. The conserved columns of the resulting multiple alignments are used to construct sequence patterns. In the last step, we verify the quality of the patterns by computing their specificity. Patterns with low specificity are omitted and recomputed further down in the tree. The final high-quality patterns can be used for functional annotation. We ran our protocol on a recent Swiss-Prot release and show statistics, as well as a comparison to PRIAM, a probabilistic method that is also specialized on the functional annotation of enzymes. We determine the amount of true positive annotations for five common microorganisms with data from BRENDA and AMENDA serving as standard of truth. BrEPS is almost on par with PRIAM, a fact which we discuss in the context of five manually investigated cases.ConclusionsOur protocol computes highly specific sequence patterns that can be used to support the functional annotation of enzymes. The main advantages of our method are that it is automatic and unsupervised, and quite fast once the patterns are evaluated. The results show that BrEPS can be a valuable addition to the reconstruction of metabolic networks.

[1]  S. Teichmann,et al.  The evolution of domain arrangements in proteins and interaction networks , 2005, Cellular and Molecular Life Sciences CMLS.

[2]  Christian J. A. Sigrist,et al.  Nucleic Acids Research Advance Access published November 14, 2007 The 20 years of PROSITE , 2007 .

[3]  N. Mulder,et al.  Tools and resources for identifying protein families, domains and motifs , 2001, Genome Biology.

[4]  C. Claudel-Renard,et al.  Enzyme-specific profiles for genome annotation: PRIAM. , 2003, Nucleic acids research.

[5]  Andrea Califano,et al.  CASTOR: Clustering Algorithm for Sequence Taxonomical Organization and Relationships , 2003, J. Comput. Biol..

[6]  C. A. D. Spring,et al.  Identifizierung ähnlicher Reaktionsmechanismen in homologen Enzymen unterschiedlicher Funktion unter Verwendung konservierter Sequenzdomänen , 2006 .

[7]  D. Higgins,et al.  Finding flexible patterns in unaligned protein sequences , 1995, Protein science : a publication of the Protein Society.

[8]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[9]  R Apweiler,et al.  Clustering and analysis of protein families. , 2001, Current opinion in structural biology.

[10]  Langer Kamp,et al.  BrEPS: a flexible and automatic protocol to compute enzyme-specific sequence profiles for functional annotation , 2010 .

[11]  Burkhard Rost,et al.  Domains, motifs and clusters in the protein universe. , 2003, Current opinion in chemical biology.

[12]  Terri K. Attwood,et al.  The Role of Pattern Databases in Sequence Analysis , 2000, Briefings Bioinform..

[13]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[14]  Adrian Welfle Erweiterte Identifizierung, automatische Generierung und Analyse von konservierten Sequenzmustern und vergleichende Analyse enzymatischer Reaktionen unter Verwendung von homologen Enzymdomänen , 2008 .

[15]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  Antje Chang,et al.  BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in 2009 , 2008, Nucleic Acids Res..

[18]  The UniProt Consortium,et al.  The Universal Protein Resource (UniProt) 2009 , 2008, Nucleic Acids Res..

[19]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[20]  Eric D. Scheeff,et al.  Fundamentals of protein structure. , 2003, Methods of biochemical analysis.

[21]  David R. Gilbert,et al.  Approaches to the Automatic Discovery of Patterns in Biosequences , 1998, J. Comput. Biol..

[22]  Alexander Schliep,et al.  ProClust: improved clustering of protein sequences with an extended graph-based approach , 2002, ECCB.