FORMAL: A model to identify organisms present in metagenomes using Monte Carlo Simulation

One of the major goals in metagenomics is to identify organisms present in the microbial community from a huge set of unknown DNA sequences. This profiling has valuable applications in multiple important areas of medical research such as disease diagnostics. Nevertheless, it is not a simple task, and many approaches that have been developed are slow and depend on the read length of the DNA sequences. Here we introduce an innovative and agile approach which k-mer and Monte Carlo simulation to profile and report abundant organisms present in metagenomic samples and their relative abundance without sequence length dependencies. The program was tested with a simulated metagenomes, and the results show that our approach predicts the organisms in microbial communities and their relative abundance.

[1]  Bas E. Dutilh,et al.  FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares , 2014, PeerJ.

[2]  Itai Sharon,et al.  Genomes from Metagenomics , 2013, Science.

[3]  Fabiano L. Thompson,et al.  Metagenomic Analysis of Healthy and White Plague-Affected Mussismilia braziliensis Corals , 2013, Microbial Ecology.

[4]  F. Thompson,et al.  Transcriptomic analysis of the red seaweed Laurencia dendroidea (Florideophyceae, Rhodophyta) and its microbiome , 2012, BMC Genomics.

[5]  Bas E. Dutilh,et al.  Taxonomic and Functional Microbial Signatures of the Endemic Marine Sponge Arenosclera brasiliensis , 2012, PloS one.

[6]  Florent E. Angly,et al.  Grinder: a versatile amplicon and shotgun sequence simulator , 2012, Nucleic acids research.

[7]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[8]  Gail L. Rosen,et al.  NBC: the Naïve Bayes Classification tool webserver for taxonomic classification of metagenomic reads , 2010, Bioinform..

[9]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[10]  S. Salzberg,et al.  Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models , 2009, Nature Methods.

[11]  Wolfgang Paul,et al.  GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model , 2009, J. Comput. Phys..

[12]  Andreas Rosenblad B. F. J. Manly: Randomization, bootstrap and Monte Carlo methods in biology, third edition , 2009, Comput. Stat..

[13]  Scott B. Baden,et al.  Fast Monte Carlo Simulation Methods for Biological Reaction-Diffusion Systems in Solution and on Surfaces , 2008, SIAM J. Sci. Comput..

[14]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[15]  Jan Palczewski,et al.  Monte Carlo Simulation , 2008, Encyclopedia of GIS.

[16]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[17]  Peter Salamon,et al.  PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information , 2005, BMC Bioinformatics.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[20]  W. Ulrich Models of relative abundance distributions I: Model fitting by stochastic models , 2001 .

[21]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[22]  W. Whitman,et al.  Prokaryotes: the unseen majority. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[23]  C. Mooney,et al.  Monte Carlo Simulation , 1997 .

[24]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[25]  Eugene W. Myers,et al.  Basic local alignment search tool. Journal of Molecular Biology , 1990 .