MUGAN: multi-GPU accelerated AmpliconNoise server for rapid microbial diversity assessment

Motivation Metagenomic sequencing has become a crucial tool for obtaining a gene catalogue of operational taxonomic units (OTUs) in a microbial community. A typical metagenomic sequencing produces a large amount of data (often in the order of terabytes or more), and computational tools are indispensable for efficient processing. In particular, error correction in metagenomics is crucial for accurate and robust genetic cataloging of microbial communities. However, many existing error-correction tools take a prohibitively long time and often bottleneck the whole analysis pipeline. Results To overcome this computational hurdle, we analyzed and exploited the data-level parallelism that exists in the error-correction procedure and proposed a tool named MUGAN that exploits both multi-core central processing units (CPUs) and multiple graphics processing units (GPUs) for co-processing. According to the experimental results, our approach reduced not only the time demand for denoising amplicons from approximately 59 hours to only 46 minutes, but also the overestimation of the number of OTUs, estimating 6.7 times less species-level OTUs than the baseline. In addition, our approach provides web-based intuitive visualization of results. Given its efficiency and convenience, we anticipate that our approach would greatly facilitate denoising efforts in metagenomics studies. Availability http://data.snu.ac.kr/pub/mugan. Contact sryoon@snu.ac.kr. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Srinivas Aluru,et al.  A survey of error-correction methods for next-generation sequencing , 2013, Briefings Bioinform..

[2]  Robert C. Edgar,et al.  Accuracy of microbial community diversity estimated by closed- and open-reference OTUs , 2017, PeerJ.

[3]  E. Purdom,et al.  Diversity of the Human Intestinal Microbial Flora , 2005, Science.

[4]  Weiguo Liu,et al.  A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-Enabled Graphics Hardware , 2010, J. Comput. Biol..

[5]  Yongchao Liu,et al.  DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI , 2011, BMC Bioinformatics.

[6]  Jacek Blazewicz,et al.  Protein alignment algorithms with an efficient backtracking routine on multiple GPUs , 2011, BMC Bioinformatics.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  J. Handelsman Metagenomics: Application of Genomics to Uncultured Microorganisms , 2004, Microbiology and Molecular Biology Reviews.

[9]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[10]  C. Quince,et al.  Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform , 2015, Nucleic acids research.

[11]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[12]  C. Quince,et al.  Accurate determination of microbial diversity from 454 pyrosequencing data , 2009, Nature Methods.

[13]  Ruth Ann Luna,et al.  Metagenomic pyrosequencing and microbial identification. , 2009, Clinical chemistry.

[14]  Tong Liu,et al.  The development of Mellanox/NVIDIA GPUDirect over InfiniBand—a new model for GPU to GPU communications , 2011, Computer Science - Research and Development.

[15]  R. Milo,et al.  Revised Estimates for the Number of Human and Bacteria Cells in the Body , 2016, bioRxiv.

[16]  Bertil Schmidt,et al.  CRiSPy-CUDA: Computing Species Richness in 16S rRNA Pyrosequencing Datasets with CUDA , 2011, PRIB.

[17]  Joshua Eichorn,et al.  Understanding AJAX: Using JavaScript to Create Rich Internet Applications , 2006 .

[18]  N. Kyrpides,et al.  Direct Comparisons of Illumina vs. Roche 454 Sequencing Technologies on the Same Microbial Community DNA Sample , 2012, PloS one.

[19]  Lauren M. Bragg,et al.  Fast, accurate error-correction of amplicon pyrosequences using Acacia , 2012, Nature Methods.

[20]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[21]  Tsachy Weissman,et al.  DUDE-Seq: Fast, flexible, and robust denoising of nucleotide sequences , 2015 .

[22]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[23]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[24]  Francisco M. Cornejo-Castillo,et al.  Metagenomic 16S rDNA Illumina tags are a powerful alternative to amplicon sequencing to explore diversity and structure of microbial communities. , 2014, Environmental microbiology.

[25]  Jeroen Raes,et al.  NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads , 2015, BMC Bioinformatics.

[26]  Russell J. Davenport,et al.  Removing Noise From Pyrosequenced Amplicons , 2011, BMC Bioinformatics.

[27]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[28]  Jürgen Eck,et al.  Metagenomics and industrial applications , 2005, Nature Reviews Microbiology.

[29]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[30]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[31]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[32]  V. Kunin,et al.  Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. , 2009, Environmental microbiology.

[33]  R. Knight,et al.  Rapid denoising of pyrosequencing amplicon data: exploiting the rank-abundance distribution , 2010, Nature Methods.

[34]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[35]  Chris Sander,et al.  MView: a web-compatible database search or multiple alignment viewer , 1998, Bioinform..