Illuminating structural proteins in viral “dark matter” with metaproteomics

Significance Marine viruses are abundant and have substantial ecosystem impacts, yet their study is hampered by the dominance of unannotated viral genes. Here, we use metaproteomics and metagenomics to examine virion-associated proteins in marine viral communities, providing tentative functions for 677,000 viral genomic sequences and the majority of previously unknown virion-associated proteins in these samples. The five most abundant protein groups comprised 67% of the metaproteomes and were tentatively identified as capsid proteins of predominantly unknown viruses, all of which putatively contain a protein fold that may be the most abundant biological structure on Earth. This methodological approach is thus shown to be a powerful way to increase our knowledge of the most numerous biological entities on the planet. Viruses are ecologically important, yet environmental virology is limited by dominance of unannotated genomic sequences representing taxonomic and functional “viral dark matter.” Although recent analytical advances are rapidly improving taxonomic annotations, identifying functional dark matter remains problematic. Here, we apply paired metaproteomics and dsDNA-targeted metagenomics to identify 1,875 virion-associated proteins from the ocean. Over one-half of these proteins were newly functionally annotated and represent abundant and widespread viral metagenome-derived protein clusters (PCs). One primarily unannotated PC dominated the dataset, but structural modeling and genomic context identified this PC as a previously unidentified capsid protein from multiple uncultivated tailed virus families. Furthermore, four of the five most abundant PCs in the metaproteome represent capsid proteins containing the HK97-like protein fold previously found in many viruses that infect all three domains of life. The dominance of these proteins within our dataset, as well as their global distribution throughout the world’s oceans and seas, supports prior hypotheses that this HK97-like protein fold is the most abundant biological structure on Earth. Together, these culture-independent analyses improve virion-associated protein annotations, facilitate the investigation of proteins within natural viral communities, and offer a high-throughput means of illuminating functional viral dark matter.

[1]  E. Delong,et al.  The Microbial Engines That Drive Earth's Biogeochemical Cycles , 2008, Science.

[2]  Matthew B. Sullivan,et al.  Rising to the challenge: accelerated pace of discovery transforms marine virology , 2015, Nature Reviews Microbiology.

[3]  M. Baker,et al.  Protruding knob-like proteins violate local symmetries in an icosahedral marine virus , 2014, Nature Communications.

[4]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[5]  E. Boyle,et al.  A simple and efficient method for concentration of ocean viruses by chemical flocculation , 2011, Environmental microbiology reports.

[6]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[7]  Rob Lavigne,et al.  Phage proteomics: applications of mass spectrometry. , 2009, Methods in molecular biology.

[8]  Bryan Krastins,et al.  The genome and structural proteome of an ocean siphovirus: a new window into the cyanobacterial ‘mobilome’ , 2009, Environmental microbiology.

[9]  Xing Zhang,et al.  A new topology of the HK97-like fold revealed in Bordetella bacteriophage by cryoEM at 3.5 Å resolution , 2013, eLife.

[10]  Bonnie L Hurwitz,et al.  Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses , 2014, Proceedings of the National Academy of Sciences.

[11]  Katherine H. Huang,et al.  Genomic analysis of oceanic cyanobacterial myoviruses compared with T4-like myoviruses from diverse hosts and environments , 2010, Environmental microbiology.

[12]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[13]  B. Hurwitz,et al.  Evaluation of methods to concentrate and purify ocean virus communities through comparative, replicated metagenomics , 2013, Environmental microbiology.

[14]  Leland Wilkinson,et al.  Exact and Approximate Area-Proportional Circular Venn and Euler Diagrams , 2012, IEEE Transactions on Visualization and Computer Graphics.

[15]  C. Suttle Marine viruses — major players in the global ecosystem , 2007, Nature Reviews Microbiology.

[16]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[17]  I. Hewson,et al.  Virus and prokaryote enumeration from planktonic aquatic environments by epifluorescence microscopy with SYBR Green I , 2007, Nature Protocols.

[18]  Forest Rohwer,et al.  Laboratory procedures to generate viral metagenomes , 2009, Nature Protocols.

[19]  H. Ackermann,et al.  Phylogeny of the Major Head and Tail Genes of the Wide-Ranging T4-Type Bacteriophages , 2001, Journal of bacteriology.

[20]  M. Sullivan,et al.  The global virome: not as big as we thought? , 2013, Current opinion in virology.

[21]  P. Bork,et al.  Patterns and ecological drivers of ocean viral communities , 2015, Science.

[22]  C. Eyers Universal sample preparation method for proteome analysis , 2009 .

[23]  Matthew L. Baker,et al.  Structural Changes in a Marine Podovirus Associated with Release of its Genome into Prochlorococcus , 2010, Nature Structural &Molecular Biology.

[24]  Sergey Koren,et al.  Single-cell genomics-based analysis of virus–host interactions in marine surface bacterioplankton , 2015, The ISME Journal.

[25]  Gabor T. Marth,et al.  MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping , 2013, PloS one.

[26]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[27]  Vincent J. Denef,et al.  Systems Biology: Functional analysis of natural microbial consortia using community proteomics , 2009, Nature Reviews Microbiology.

[28]  Victor Seguritan,et al.  Artificial Neural Networks Trained to Detect Viral and Phage Structural Proteins , 2012, PLoS Comput. Biol..

[29]  Tanja Woyke,et al.  Viral dark matter and virus–host interactions resolved from publicly available microbial genomes , 2015, eLife.

[30]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[31]  P. Laurinmäki,et al.  Structure of the archaeal head-tailed virus HSTV-1 completes the HK97 fold story , 2013, Proceedings of the National Academy of Sciences.

[32]  T. Deerinck,et al.  Abundant SAR11 viruses in the ocean , 2013, Nature.

[33]  Vincent J. Denef,et al.  Population Genomic Analysis of Strain Variation in Leptospirillum Group II Bacteria Involved in Acid Mine Drainage Formation , 2008, PLoS biology.

[34]  Manesh Shah,et al.  Twelve previously unknown phage genera are ubiquitous in global oceans , 2013, Proceedings of the National Academy of Sciences.

[35]  D. Stuart,et al.  What does structure tell us about virus evolution? , 2005, Current opinion in structural biology.

[36]  M. Rossmann,et al.  Conservation of the capsid structure in tailed dsDNA bacteriophages: the pseudoatomic structure of phi29. , 2005, Molecular cell.

[37]  Peer Bork,et al.  Open science resources for the discovery and analysis of Tara Oceans data , 2015, Scientific Data.

[38]  Christian Cambillau,et al.  A Common Evolutionary Origin for Tailed-Bacteriophage Functional Modules and Bacterial Machineries , 2011, Microbiology and Molecular Reviews.

[39]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[40]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[41]  Benjamin J. Raphael,et al.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.

[42]  M. Mann,et al.  Universal sample preparation method for proteome analysis , 2009, Nature Methods.

[43]  Peer Bork,et al.  MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit , 2012, PloS one.

[44]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[45]  J. Yates,et al.  DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. , 2002, Journal of proteome research.

[46]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[47]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[48]  Alison S. Waller,et al.  Genomic variation landscape of the human gut microbiome , 2012, Nature.

[49]  Matthew B. Sullivan,et al.  The Pacific Ocean Virome (POV): A Marine Viral Metagenomic Dataset and Associated Protein Clusters for Quantitative Viral Ecology , 2013, PloS one.

[50]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[51]  K. Lilley,et al.  Proteomic analysis of the EhV-86 virion , 2008, Proteome Science.

[52]  Bonnie L Hurwitz,et al.  Depth-stratified functional and taxonomic niche specialization in the ‘core’ and ‘flexible’ Pacific Ocean Virome , 2014, The ISME Journal.

[53]  P. Bork,et al.  A Holistic Approach to Marine Eco-Systems Biology , 2011, PLoS biology.

[54]  Gabriel Lander,et al.  Capsid conformational sampling in HK97 maturation visualized by X-ray crystallography and cryo-EM. , 2006, Structure.

[55]  R. Hettich,et al.  Microbial metaproteomics: identifying the repertoire of proteins that microorganisms use to compete and cooperate in complex environmental communities. , 2012, Current opinion in microbiology.

[56]  H. Oh,et al.  Genome of a SAR116 bacteriophage shows the prevalence of this phage type in the oceans , 2013, Proceedings of the National Academy of Sciences.

[57]  S. Hallam,et al.  Ecology and evolution of viruses infecting uncultivated SUP05 bacteria as revealed by single-cell- and meta-genomics , 2014, eLife.

[58]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[59]  S. Hallam,et al.  Metabolic reprogramming by viruses in the sunlit and dark ocean , 2013, Genome Biology.

[60]  P. Crill,et al.  Discovery of a novel methanogen prevalent in thawing permafrost , 2014, Nature Communications.

[61]  Luke R Thompson,et al.  Prevalence and Evolution of Core Photosystem II Genes in Marine Cyanobacterial Viruses and Their Hosts , 2006, PLoS biology.

[62]  M. Rossmann,et al.  Structural and functional similarities between the capsid proteins of bacteriophages T4 and HK97 point to a common ancestry. , 2005, Proceedings of the National Academy of Sciences of the United States of America.