Conservation of Gene Cassettes among Diverse Viruses of the Human Gut

Viruses are a crucial component of the human microbiome, but large population sizes, high sequence diversity, and high frequencies of novel genes have hindered genomic analysis by high-throughput sequencing. Here we investigate approaches to metagenomic assembly to probe genome structure in a sample of 5.6 Gb of gut viral DNA sequence from six individuals. Tests showed that a new pipeline based on DeBruijn graph assembly yielded longer contigs that were able to recruit more reads than the equivalent non-optimized, single-pass approach. To characterize gene content, the database of viral RefSeq proteins was compared to the assembled viral contigs, generating a bipartite graph with functional cassettes linking together viral contigs, which revealed a high degree of connectivity between diverse genomes involving multiple genes of the same functional class. In a second step, open reading frames were grouped by their co-occurrence on contigs in a database-independent manner, revealing conserved cassettes of co-oriented ORFs. These methods reveal that free-living bacteriophages, while usually dissimilar at the nucleotide level, often have significant similarity at the level of encoded amino acid motifs, gene order, and gene orientation. These findings thus connect contemporary metagenomic analysis with classical studies of bacteriophage genomic cassettes. Software is available at https://sourceforge.net/projects/optitdba/.

[1]  D. Lipman,et al.  National Center for Biotechnology Information , 2019, Springer Reference Medizin.

[2]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[3]  Huzefa Rangwala,et al.  Evaluation of short read metagenomic assembly , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[4]  F. Bushman,et al.  The human gut virome: inter-individual variation and dynamic response to diet. , 2011, Genome research.

[5]  L. Black,et al.  DNA Packaging in Bacteriophage T4 , 2013 .

[6]  Florent E. Angly,et al.  Power law rank-abundance models for marine phage communities. , 2007, FEMS microbiology letters.

[7]  R. Leplae,et al.  A modular view of the bacteriophage genomic space: identification of host and lifestyle marker modules. , 2011, Research in microbiology.

[8]  C. Catalano Viral Genome Packaging Machines , 2005, Viral Genome Packaging Machines: Genetics, Structure, and Mechanism.

[9]  T Friedmann,et al.  The nucleotide sequence of bacteriophage phiX174. , 1978, Journal of molecular biology.

[10]  D. Botstein,et al.  Properties of hybrids between Salmonella phage P22 and coliphage λ , 1974, Nature.

[11]  Forest Rohwer,et al.  Metagenomic detection of phage-encoded platelet-binding factors in the human oral cavity , 2010, Proceedings of the National Academy of Sciences.

[12]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  R. Doolittle,et al.  Homology among DNA-binding proteins suggests use of a conserved super-secondary structure , 1982, Nature.

[14]  A. Moya,et al.  Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data , 2011, PloS one.

[15]  N. Kyrpides,et al.  Individual genome assembly from complex community short-read metagenomic datasets , 2011, The ISME Journal.

[16]  Florent E. Angly,et al.  Viral diversity and dynamics in an infant gut. , 2008, Research in microbiology.

[17]  A. D. Hershey,et al.  The Bacteriophage Lambda. , 1971 .

[18]  D. Botstein A THEORY OF MODULAR EVOLUTION FOR BACTERIOPHAGES * , 1980, Annals of the New York Academy of Sciences.

[19]  Bart Barrell,et al.  The nucleotide sequence of bacteriophage φX174 , 1978 .

[20]  Siu-Ming Yiu,et al.  IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler , 2010, RECOMB.

[21]  Jo McEntyre,et al.  The NCBI Handbook , 2002 .

[22]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[23]  Mihai Pop,et al.  Assembly complexity of prokaryotic genomes using short reads , 2010, BMC Bioinformatics.

[24]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[25]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[26]  M. Ptashne A Genetic Switch , 1986 .

[27]  P. Leiman,et al.  Contractile tail machines of bacteriophages. , 2012, Advances in experimental medicine and biology.

[28]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[29]  Narmada Thanki,et al.  CDD: a Conserved Domain Database for the functional annotation of proteins , 2010, Nucleic Acids Res..

[30]  Y. Ruan,et al.  Broad Surveys of DNA Viral Diversity Obtained through Viral Metagenomics of Mosquitoes , 2011, PloS one.

[31]  R. Mural,et al.  λimmλ·434: A phage with a hybrid immunity region , 1973 .

[32]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[33]  J. Karam,et al.  Genomes of the T4-related bacteriophages as windows on microbial genome evolution , 2010, Virology Journal.

[34]  Frederic D Bushman,et al.  Hypervariable loci in the human gut virome , 2012, Proceedings of the National Academy of Sciences.

[35]  R. Edwards,et al.  The Phage Proteomic Tree: a Genome-Based Taxonomy for Phage , 2002, Journal of bacteriology.

[36]  Hui Shen,et al.  Comparative studies of de novo assembly tools for next-generation sequencing technologies , 2011, Bioinform..

[37]  Shiraz A. Shah,et al.  CRISPR/Cas and Cmr modules, mobility and evolution of adaptive immune systems. , 2011, Research in microbiology.

[38]  B. Matthews,et al.  Comparison of the structures of Cro and λ repressor proteins from bacteriophage λ , 1983 .

[39]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[40]  Bairong Shen,et al.  A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies , 2011, PloS one.

[41]  Forest Rohwer,et al.  Laboratory procedures to generate viral metagenomes , 2009, Nature Protocols.

[42]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[43]  Forest Rohwer,et al.  Global Phage Diversity , 2003, Cell.

[44]  H. Brüssow,et al.  Comparative Genomics of Streptococcus thermophilus Phage Species Supports a Modular Evolution Theory , 1999, Journal of Virology.

[45]  J. Claverie,et al.  Distant Mimivirus relative with a larger genome highlights the fundamental features of Megaviridae , 2011, Proceedings of the National Academy of Sciences.

[46]  F. Rohwer,et al.  Explaining microbial population genomics through phage predation , 2009, Nature Reviews Microbiology.

[47]  Siu-Ming Yiu,et al.  Meta-IDBA: a de Novo assembler for metagenomic data , 2011, Bioinform..

[48]  P. Salamon,et al.  Metagenomic Analyses of an Uncultured Viral Community from Human Feces , 2003, Journal of bacteriology.

[49]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[50]  B. Matthews,et al.  Comparison of the structures of cro and lambda repressor proteins from bacteriophage lambda. , 1989, Journal of molecular biology.

[51]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.