MetaGeniE: Characterizing Human Clinical Samples Using Deep Metagenomic Sequencing

With the decreasing cost of next-generation sequencing, deep sequencing of clinical samples provides unique opportunities to understand host-associated microbial communities. Among the primary challenges of clinical metagenomic sequencing is the rapid filtering of human reads to survey for pathogens with high specificity and sensitivity. Metagenomes are inherently variable due to different microbes in the samples and their relative abundance, the size and architecture of genomes, and factors such as target DNA amounts in tissue samples (i.e. human DNA versus pathogen DNA concentration). This variation in metagenomes typically manifests in sequencing datasets as low pathogen abundance, a high number of host reads, and the presence of close relatives and complex microbial communities. In addition to these challenges posed by the composition of metagenomes, high numbers of reads generated from high-throughput deep sequencing pose immense computational challenges. Accurate identification of pathogens is confounded by individual reads mapping to multiple different reference genomes due to gene similarity in different taxa present in the community or close relatives in the reference database. Available global and local sequence aligners also vary in sensitivity, specificity, and speed of detection. The efficiency of detection of pathogens in clinical samples is largely dependent on the desired taxonomic resolution of the organisms. We have developed an efficient strategy that identifies “all against all” relationships between sequencing reads and reference genomes. Our approach allows for scaling to large reference databases and then genome reconstruction by aggregating global and local alignments, thus allowing genetic characterization of pathogens at higher taxonomic resolution. These results were consistent with strain level SNP genotyping and bacterial identification from laboratory culture.

[1]  T. Sata,et al.  Detection of a Possible Bioterrorism Agent, Francisella sp., in a Clinical Specimen by Use of Next-Generation Direct DNA Sequencing , 2012, Journal of Clinical Microbiology.

[2]  M. Dodd,et al.  Evidence for transmission of Pseudomonas cepacia by social contact in cystic fibrosis , 1993, The Lancet.

[3]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[4]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[5]  Jonathan A Eisen,et al.  Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes , 2007, PLoS biology.

[6]  Forest Rohwer,et al.  Metagenomic Analysis of Respiratory Tract DNA Viral Communities in Cystic Fibrosis and Non-Cystic Fibrosis Individuals , 2009, PloS one.

[7]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[8]  B. Birren,et al.  Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. , 2012, Genome research.

[9]  M. Kuroda,et al.  MePIC, metagenomic pathogen identification for clinical specimens. , 2014, Japanese journal of infectious diseases (Print).

[10]  A. Moya,et al.  Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data , 2011, PloS one.

[11]  S. Nelson,et al.  BFAST: An Alignment Tool for Large Scale Genome Resequencing , 2009, PloS one.

[12]  Scott N Peterson,et al.  Whole genome single nucleotide polymorphism based phylogeny of Francisella tularensis and its application to the development of a strain typing assay , 2009, BMC Microbiology.

[13]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[14]  M. Breitbart,et al.  Metagenomic sequencing for virus identification in a public-health setting. , 2010, The Journal of general virology.

[15]  Kun Qu,et al.  Rapid identification of non-human sequences in high-throughput sequencing datasets , 2012, Bioinform..

[16]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[17]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[18]  R. Daniel,et al.  Metagenomic Analyses: Past and Future Trends , 2010, Applied and Environmental Microbiology.

[19]  Martin Goodson,et al.  Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. , 2011, Genome research.

[20]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[21]  H. Coutinho,et al.  International Archives of Medicine BioMed Central Review , 2008 .

[22]  Barbara A. Bailey,et al.  Clinical Insights from Metagenomic Analysis of Sputum Samples from Patients with Cystic Fibrosis , 2013, Journal of Clinical Microbiology.

[23]  S. Dowd,et al.  Direct sampling of cystic fibrosis lungs indicates that DNA-based analyses of upper-airway specimens can misrepresent lung microbiota , 2012, Proceedings of the National Academy of Sciences.

[24]  J. Parkhill,et al.  Partitioning core and satellite taxa from within cystic fibrosis lung bacterial communities , 2010, The ISME Journal.

[25]  B. Langmead,et al.  Aligning Short Sequencing Reads with Bowtie , 2010, Current protocols in bioinformatics.

[26]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[27]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[28]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[29]  M. Zaharia,et al.  A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples , 2014, Genome Research.

[30]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[31]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[32]  S. Leroy,et al.  The Airway Microbiota in Cystic Fibrosis: A Complex Fungal and Bacterial Community—Implications for Therapeutic Management , 2012, PloS one.

[33]  Alla Lapidus,et al.  A Bioinformatician's Guide to Metagenomics , 2008, Microbiology and Molecular Biology Reviews.

[34]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[35]  Stephanie L. Servetas,et al.  Comparison of three next-generation sequencing platforms for metagenomic sequencing and identification of pathogens in blood , 2014, BMC Genomics.

[36]  G. Getz,et al.  PathSeq: software to identify or discover microbes by deep sequencing of human tissue , 2011, Nature Biotechnology.

[37]  Florent E. Angly,et al.  Grinder: a versatile amplicon and shotgun sequence simulator , 2012, Nucleic acids research.

[38]  N. Perna,et al.  progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement , 2010, PloS one.

[39]  Paul Flicek,et al.  Sense from sequence reads: methods for alignment and assembly , 2009, Nature Methods.

[40]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[41]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[42]  H. Guzmán,et al.  Identification of Novel Viruses Using VirusHunter -- an Automated Data Analysis Pipeline , 2013, PloS one.

[43]  R. Edwards,et al.  Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets , 2011, PloS one.

[44]  J. Sarles,et al.  Molecular Detection of Multiple Emerging Pathogens in Sputa from Cystic Fibrosis Patients , 2008, PloS one.

[45]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[46]  Fangqing Zhao,et al.  Composition-based classification of short metagenomic sequences elucidates the landscapes of taxonomic and functional enrichment of microorganisms , 2012, Nucleic acids research.

[47]  A. Salamov,et al.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods , 2007, Nature Methods.

[48]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[49]  E. Perkins,et al.  CAPRG: Sequence Assembling Pipeline for Next Generation Sequencing of Non-Model Organisms , 2012, PloS one.

[50]  Henry M. Wood,et al.  IMSA: Integrated Metagenomic Sequence Analysis for Identification of Exogenous Reads in a Host Genomic Background , 2013, PloS one.

[51]  E. Mardis Next-generation DNA sequencing methods. , 2008, Annual review of genomics and human genetics.