Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF

Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k.

[1]  Marco Fondi,et al.  The horizontal flow of the plasmid resistome: clues from inter-generic similarity networks. , 2010, Environmental microbiology.

[2]  Peter J. Stuckey,et al.  AUTOMATED PAIR-WISE COMPARISONS OF MICROBIAL GENOMES , 1998 .

[3]  M. Ragan,et al.  Lateral genetic transfer: open issues , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[4]  V. Moulton,et al.  Neighbor-net: an agglomerative method for the construction of phylogenetic networks. , 2002, Molecular biology and evolution.

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  Maureen A. O’Malley,et al.  How stands the Tree of Life a century and a half after The Origin? , 2011, Biology Direct.

[7]  E. Denamur,et al.  Assigning Escherichia coli strains to phylogenetic groups: multi-locus sequence typing versus the PCR triplex method. , 2008, Environmental microbiology.

[8]  Stephen B. Seidman,et al.  Clique - like structures in directed networks , 1980 .

[9]  Michael R. Fellows,et al.  Parameterized complexity: A framework for systematically confronting computational intractability , 1997, Contemporary Trends in Discrete Mathematics.

[10]  Kai Wang,et al.  Lower bounds on paraclique density , 2016, Discret. Appl. Math..

[11]  M. Ragan,et al.  A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF , 2016, Scientific Reports.

[12]  J. Lake,et al.  Horizontal gene transfer accelerates genome innovation and evolution. , 2003, Molecular biology and evolution.

[13]  M. Ragan Detection of lateral gene transfer among microbial genomes. , 2001, Current opinion in genetics & development.

[14]  Stephen B. Seidman,et al.  A graph‐theoretic generalization of the clique concept* , 1978 .

[15]  M. Ragan,et al.  Lateral genetic transfer and the construction of genetic exchange communities. , 2011, FEMS microbiology reviews.

[16]  Michael A. Langston,et al.  Combinatorial Genetic Regulatory Network Analysis Tools for High Throughput Transcriptomic Data , 2005, Systems Biology and Regulatory Genomics.

[17]  Yan P. Yuan,et al.  Predicting function: from genes to genomes and back. , 1998, Journal of molecular biology.

[18]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[19]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[20]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[21]  M. Ragan On surrogate methods for detecting lateral gene transfer. , 2001, FEMS microbiology letters.

[22]  W. Martin,et al.  Directed networks reveal genomic barriers and DNA repair bypasses to lateral gene transfer among prokaryotes. , 2011, Genome research.

[23]  Eugene V. Koonin,et al.  The Turbulent Network Dynamics of Microbial Evolution and the Statistical Tree of Life , 2015, Journal of Molecular Evolution.

[24]  M. Ragan,et al.  Inferring phylogenies of evolving sequences without multiple sequence alignment , 2014, Scientific Reports.

[25]  Nicholas Hamilton,et al.  Phylogenetic identification of lateral genetic transfer events , 2006, BMC Evolutionary Biology.

[26]  R. Dickerson Evolution and gene transfer in purple photosynthetic bacteria , 1980, Nature.

[27]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[29]  J. Lake,et al.  Horizontal gene transfer among genomes: the complexity hypothesis. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[30]  M. Ragan,et al.  Are Protein Domains Modules of Lateral Genetic Transfer? , 2009, PloS one.

[31]  W. Martin,et al.  Getting a better picture of microbial evolution en route to a network of genomes , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[32]  Leon Goldovsky,et al.  The net of life: reconstructing the microbial phylogenetic network. , 2005, Genome research.

[33]  Eric Bapteste,et al.  Network analyses structure genetic diversity in independent genetic worlds , 2009, Proceedings of the National Academy of Sciences.

[34]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[35]  Zhixiong Xie,et al.  Horizontal Gene Transfer , 2003, Methods in Molecular Biology.

[36]  T. Vicsek,et al.  Directed network modules , 2007, physics/0703248.

[37]  Mark A Ragan,et al.  Detecting lateral genetic transfer : a phylogenetic approach. , 2008, Methods in molecular biology.

[38]  T. Meyer,et al.  Anomalies in amino acid sequences of small cytochromes c and cytochromes c′ from two species of purple photosynthetic bacteria , 1979, Nature.

[39]  C. Woese,et al.  Do genealogical patterns in purple photosynthetic bacteria reflect interspecific gene transfer? , 1980, Nature.

[40]  A. Rodrigo,et al.  Likelihood-based tests of topologies in phylogenetics. , 2000, Systematic biology.

[41]  Mark A. Ragan,et al.  Exploring lateral genetic transfer among microbial genomes using TF-IDF , 2016, Scientific Reports.

[42]  W. Doolittle,et al.  Prokaryotic evolution in light of gene transfer. , 2002, Molecular biology and evolution.

[43]  Steven Kelk,et al.  Networks: expanding evolutionary thinking. , 2013, Trends in genetics : TIG.

[44]  J. Townsend,et al.  Horizontal gene transfer, genome innovation and evolution , 2005, Nature Reviews Microbiology.

[45]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[46]  W. Doolittle,et al.  Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. , 2006, Genome research.

[47]  Eric Bapteste,et al.  INAUGURAL ARTICLE by a Recently Elected Academy Member:Pattern pluralism and the Tree of Life hypothesis , 2007 .

[48]  Stephen D. Bentley,et al.  Diversification of bacterial genome content through distinct mechanisms over different timescales , 2014, Nature Communications.

[49]  W. Doolittle,et al.  The practice of classification and the theory of evolution, and what the demise of Charles Darwin's tree of life hypothesis means for both of them , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[50]  M. Ragan,et al.  Inferring Genome Trees by Using a Filter To Eliminate Phylogenetically Discordant Sequences and a Distance Matrix Based on Mean Normalized BLASTP Scores , 2002, Journal of bacteriology.

[51]  B. Segerman The genetic integrity of bacterial species: the core genome and the accessory genome, two different stories , 2012, Front. Cell. Inf. Microbio..

[52]  Faisal N. Abu-Khzam,et al.  Scalable Parallel Algorithms for FPT Problems , 2006, Algorithmica.

[53]  M. Ragan Phylogenetic inference based on matrix representation of trees. , 1992, Molecular phylogenetics and evolution.

[54]  T. Meyer,et al.  Cytochromes C2 sequence variation among the recognised species of purple nonsulphur photosynthetic bacteria , 1979, Nature.

[55]  E. Koonin,et al.  The Tree and Net Components of Prokaryote Evolution , 2010, Genome biology and evolution.

[56]  Gipsi Lima-Mendez,et al.  Reticulate representation of evolutionary and functional relationships between phage genomes. , 2008, Molecular biology and evolution.

[57]  Mark A. Ragan,et al.  Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer , 2016, Scientific Reports.

[58]  Tal Dagan,et al.  Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution , 2008, Proceedings of the National Academy of Sciences.

[59]  Timothy J. Harlow,et al.  Highways of gene sharing in prokaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[60]  M. Ragan,et al.  Lateral Transfer of Genes and Gene Fragments in Prokaryotes , 2009, Genome biology and evolution.

[61]  Cheong Xin Chan,et al.  Recapitulating phylogenies using k-mers: from trees to networks , 2016, F1000Research.