Automatic detection of subsystem/pathway variants in genome analysis

MOTIVATION Proteins work together in pathways and networks, collectively comprising the cellular machinery. A subsystem (a generalization of pathway concept) is a group of related functional roles (such as enzymes) jointly involved in a specific aspect of the cellular machinery. Subsystems provide a natural framework for comparative genome analysis and functional annotation. A subsystem may be implemented in a number of different functional variants in individual species. In order to reliably project functional assignments across multiple genomes, we have to be able to identify the variants implemented in each genome. The analysis of such variants across diverse species is an interesting problem by itself and may provide new evolutionary insights. However, no computational techniques are presently available for an automated detection and analysis of subsystem variants. RESULTS Here we formulate the subsystem variant detection problem as finding the minimum number of subgraphs of a subsystem, which is represented as a graph, and solve the optimization problem by integer programming approach. The performance of our method was tested on subsystems encoded in the SEED, a genomic integration platform developed by the Fellowship for Interpretation of Genomes as a component of a large-scale effort on comparative analysis and annotation of multiple diverse genomes. Here we illustrate the results obtained for two expert-encoded subsystems of the biosynthesis of Coenzyme A and FMN/FAD cofactors. Applications of variant detection, to support genomic annotations and to assess divergence of species, are briefly discussed in the context of these universally conserved and essential metabolic subsystems. SUPPLEMENTARY INFORMATION The details of the variant detection results are available at http://ffas.burnham.org/svar/supp.html.

[1]  Mark D'Souza,et al.  From Genetic Footprinting to Antimicrobial Drug Targets: Examples in Cofactor Biosynthetic Pathways , 2002, Journal of bacteriology.

[2]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Natalia Ivanova,et al.  The ERGOTM genome analysis and discovery system , 2003, Nucleic Acids Res..

[4]  Robert D. Carr,et al.  1001 Optimal PDB Structure Alignments: Integer Programming Methods for Finding the Maximum Contact Map Overlap , 2004, J. Comput. Biol..

[5]  Michael Y. Galperin,et al.  Chapter 17 FUNCTIONAL GENOMICS AND ENZYME EVOLUTION Homologous and Analogous Enzymes Encoded in Microbial , 1999 .

[6]  W. Eisenreich,et al.  Biosynthesis of riboflavin. , 2001, Vitamins and hormones.

[7]  B. Snel,et al.  Pathway alignment: application to the comparative analysis of glycolytic enzymes. , 1999, The Biochemical journal.

[8]  B. Snel,et al.  Function prediction and protein networks. , 2003, Current opinion in cell biology.

[9]  Peter D. Karp,et al.  A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases , 2004, BMC Bioinformatics.

[10]  E. Strauss,et al.  The biosynthesis of coenzyme A in bacteria. , 2001, Vitamins and hormones.

[11]  James Evans,et al.  Optimization algorithms for networks and graphs , 1992 .

[12]  Li Liao,et al.  Genome Comparisons Based on Profiles of Metabolic Pathways , 2002 .

[13]  Martin Vingron,et al.  Optimal robust non-unique probe selection using Integer Linear Programming , 2004, ISMB/ECCB.

[14]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[15]  Charles O. Rock,et al.  A New Mechanism for Anaerobic Unsaturated Fatty Acid Formation inStreptococcus pneumoniae * , 2002, The Journal of Biological Chemistry.

[16]  R. Overbeek,et al.  Missing genes in metabolic pathways: a comparative genomics approach. , 2003, Current opinion in chemical biology.

[17]  M. R. Rao,et al.  Combinatorial Optimization , 1992, NATO ASI Series.

[18]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[19]  E. Strauss,et al.  The Antibiotic Activity of N-Pentylpantothenamide Results from Its Conversion to Ethyldethia-Coenzyme A, a Coenzyme A Antimetabolite* , 2002, The Journal of Biological Chemistry.

[20]  T. Dandekar,et al.  Comparative genome analysis and pathway reconstruction. , 2002, Pharmacogenomics.

[21]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[22]  Thomas Pfeiffer,et al.  Exploring the pathway structure of metabolism: decomposition into subnetworks and application to Mycoplasma pneumoniae , 2002, Bioinform..

[23]  Klaus Schulten,et al.  Evolution of Metabolisms: A New Method for the Comparison of Metabolic Pathways Using Genomics Information , 1999, J. Comput. Biol..

[24]  References , 1971 .

[25]  Owen White,et al.  Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics , 2005, Bioinform..

[26]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[27]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Ambuj K. Singh,et al.  Deriving phylogenetic trees from the similarity analysis of metabolic pathways , 2003, ISMB.

[29]  Jason A. Papin,et al.  Genome-scale microbial in silico models: the constraints-based approach. , 2003, Trends in biotechnology.

[30]  Michael Y. Galperin,et al.  Functional genomics and enzyme evolution , 2004, Genetica.

[31]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[32]  Robert H. White,et al.  The Pyrimidine Nucleotide Reductase Step in Riboflavin and F420 Biosynthesis in Archaea Proceeds by the Eukaryotic Route to Riboflavin , 2002, Journal of bacteriology.

[33]  Peter D. Karp,et al.  EcoCyc: a comprehensive database resource for Escherichia coli , 2004, Nucleic Acids Res..

[34]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[35]  Jason A. Papin,et al.  Comparison of network-based pathway analysis methods. , 2004, Trends in biotechnology.

[36]  B. Palsson,et al.  Theory for the systemic definition of metabolic pathways and their use in interpreting metabolic function from a pathway-oriented perspective. , 2000, Journal of theoretical biology.

[37]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[38]  E. Marcotte,et al.  Computational genetics: finding protein function by nonhomology methods. , 2000, Current opinion in structural biology.

[39]  Ray Wild,et al.  Optimization Algorithms for Networks and Graphs , 1980 .