The Use of Weighted Graphs for Large-Scale Genome Analysis

There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution.

[1]  M. Feldman,et al.  Large-scale reconstruction and phylogenetic analysis of metabolic environments , 2008, Proceedings of the National Academy of Sciences.

[2]  Simon Conway Morris,et al.  Wonderful Crucible@@@The Crucible of Creation: The Burgess Shale and the Rise of Animals. , 1998 .

[3]  Masanori Arita The metabolic world of Escherichia coli is not small. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Ludovic Cottret,et al.  An Introduction to Metabolic Networks and Their Structural Analysis , 2008, IEEE ACM Trans. Comput. Biol. Bioinform..

[5]  J. Hailman Wonderful Life: The Burgess Shale and the Nature of History, Stephen Jay Gould. W. W. Norton, New York (1989), 347, Price $19.95 (U.S.A.), $27.95 (Canada) , 1991 .

[6]  P. Bork,et al.  Evolution of biomolecular networks — lessons from metabolic and protein interactions , 2009, Nature Reviews Molecular Cell Biology.

[7]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Sophia Tsoka,et al.  The phylogenetic extent of metabolic enzymes and pathways. , 2003, Genome research.

[9]  W. Miller,et al.  A time-efficient, linear-space local similarity algorithm , 1991 .

[10]  C. Woese,et al.  Bacterial evolution , 1987, Microbiological reviews.

[11]  Fang Zhou,et al.  Compression of weighted graphs , 2011, KDD.

[12]  Ken E. Whelan,et al.  The Automation of Science , 2009, Science.

[13]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[14]  Petter Holme,et al.  Model validation of simple-graph representations of metabolism , 2008, Journal of The Royal Society Interface.

[15]  Janet M Thornton,et al.  The complement of enzymatic sets in different species. , 2005, Journal of molecular biology.

[16]  Ross D. King,et al.  Using a logical model to predict the growth of yeast , 2008, BMC Bioinformatics.

[17]  J. Nielsen,et al.  Uncovering transcriptional regulation of metabolism by using metabolic network topology. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Anat Kreimer,et al.  The evolution of modularity in bacterial metabolic networks , 2008, Proceedings of the National Academy of Sciences.