Meta-Prism: Ultra-fast and highly accurate microbial community structure search utilizing dual indexing and parallel computation

Microbiome samples are accumulating at an unprecedented speed. As a result, a massive amount of samples have become available for the mining of the intrinsic patterns among them. However, due to the lack of advanced computational tools, fast yet accurate comparisons and searches among thousands to millions of samples are still in urgent need. In this work, we proposed the Meta-Prism method for comparing and searching the microbial community structures amongst tens of thousands of samples. Meta-Prism is at least 10 times faster than contemporary methods serving the same purpose and can provide very accurate search results. The method is based on three computational techniques: dual-indexing approach for sample subgrouping, refined scoring function that could scrutinize the minute differences among samples, and parallel computation on CPU or GPU. The superiority of Meta-Prism on speed and accuracy for multiple sample searches is proven based on searching against ten thousand samples derived from both human and environments. Therefore, Meta-Prism could facilitate similarity search and in-depth understanding among massive number of heterogenous samples in the microbiome universe. The codes of Meta-Prism are available at: https://github.com/HUST-NingKang-Lab/metaPrism.

[1]  R. Knight,et al.  Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data , 2009, The ISME Journal.

[2]  Luke R. Thompson,et al.  Best practices for analysing microbiomes , 2018, Nature Reviews Microbiology.

[3]  Alex L. Mitchell,et al.  Metagenomic analysis: the challenge of the data bonanza , 2012, Briefings Bioinform..

[4]  Falk Hildebrand,et al.  Structure and function of the global topsoil microbiome , 2018, Nature.

[5]  Eran Halperin,et al.  FEAST: fast expectation-maximization for microbial source tracking , 2019, Nature Methods.

[6]  J. Labov,et al.  Metagenomics: a call for bringing a new science into the classroom (while it's still new). , 2007, CBE life sciences education.

[7]  Jian Xu,et al.  Meta-Storms: efficient search for similar microbial communities based on a novel indexing scheme and similarity score for metagenomic data , 2012, Bioinform..

[8]  Rob Knight,et al.  Striped UniFrac: enabling microbiome analysis at unprecedented scale , 2018, Nature Methods.

[9]  Futao Zhang,et al.  FastGCN: A GPU Accelerated Tool for Fast Gene Co-Expression Networks , 2015, PloS one.

[10]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[11]  Zheng Sun,et al.  Identifying and Predicting Novelty in Microbiome Studies , 2018, mBio.

[12]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[13]  Mikhail Tikhonov,et al.  Emergent simplicity in microbial community assembly , 2017, Science.

[14]  Arthur Brady,et al.  Strains, functions and dynamics in the expanded Human Microbiome Project , 2017, Nature.

[15]  Kang Ning,et al.  GPU-Meta-Storms: computing the structure similarities among massive amount of microbial community samples using GPU , 2014, Bioinform..

[16]  Rick L. Stevens,et al.  A communal catalogue reveals Earth’s multiscale microbial diversity , 2017, Nature.