TECHNICAL NOTE Open Access

Background: In metagenomics, microbial communities are sequenced at increasingly high resolution, generating datasets with billions of DNA fragments. Novel methods that can efficiently process the growing volumes of sequence data are necessary for the accurate analysis and interpretation of existing and upcoming metagenomes. Findings: Here we present Tentacle, which is a novel framework that uses distributed computational resources for gene quantification in metagenomes. Tentacle is implemented using a dynamic master-worker approach in which DNA fragments are streamed via a network and processed in parallel on worker nodes. Tentacle is modular, extensible, and comes with support for six commonly used sequence aligners. It is easy to adapt Tentacle to different applications in metagenomics and easy to integrate into existing workflows. Conclusions: Evaluations show that Tentacle scales very well with increasing computing resources. We illustrate the versatility of Tentacle on three different use cases. Tentacle is written for Linux in Python 2.7 and is published as open source under the GNU General Public License (v3). Documentation, tutorials, installation instructions, and the source code are freely available online at: http://bioinformatics.math.chalmers.se/tentacle.

[1]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[2]  Mikhail J. Atallah,et al.  Algorithms and Theory of Computation Handbook , 2009, Chapman & Hall/CRC Applied Algorithms and Data Structures series.

[3]  David R. Riley,et al.  CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing , 2011, BMC Bioinformatics.

[4]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[7]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[8]  Rob Knight,et al.  The Earth Microbiome project: successes and aspirations , 2014, BMC Biology.

[9]  Knut Reinert,et al.  RazerS 3: Faster, fully sensitive read mapping , 2012, Bioinform..

[10]  Monya Baker,et al.  Next-generation sequencing: adjusting to data overload , 2010, Nature Methods.

[11]  Rick Stevens,et al.  The Earth Microbiome Project: The Meeting Report for the 1st International Earth Microbiome Project Conference, Shenzhen, China, June 13th-15th 2011 , 2011, Standards in Genomic Sciences.

[12]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[13]  Christian Schlötterer,et al.  DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster , 2013, PloS one.

[14]  Sebastian Deorowicz,et al.  DSRC 2 - Industry-oriented compression of FASTQ files , 2014, Bioinform..

[15]  Chao-Tung Yang,et al.  G-BLAST: a Grid-based solution for mpiBLAST on computational Grids , 2009 .

[16]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[17]  Weisong Shi,et al.  CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping , 2011, BMC Research Notes.

[18]  Nikos Kyrpides,et al.  The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification , 2014, Nucleic Acids Res..

[19]  J. Handelsman Metagenomics: Application of Genomics to Uncultured Microorganisms , 2004, Microbiology and Molecular Biology Reviews.

[20]  Alicia R. Martin,et al.  STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud , 2014, PloS one.

[21]  F. Bushman,et al.  QIIME allows integration and analysis of high-throughput community sequencing data. Nat. Meth. , 2010 .

[22]  Thomas P. Curtis,et al.  Estimating prokaryotic diversity and its limits , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[24]  Véronique Martin,et al.  Mapping Reads on a Genomic Sequence: An Algorithmic Overview and a Practical Comparative Analysis , 2012, J. Comput. Biol..

[25]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[26]  J. Gilbert,et al.  Microbial metagenomics: beyond the genome. , 2011, Annual review of marine science.

[27]  David P. Rodgers,et al.  Improvements in multiprocessor system design , 1985, ISCA '85.

[28]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[29]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[30]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[31]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[32]  Stephen L. Scott,et al.  Reliability of a System of k Nodes for High Performance Computing Applications , 2010, IEEE Transactions on Reliability.

[33]  Roland L. Dunbrack,et al.  BeoBLAST: distributed BLAST and PSI-BLAST on a Beowulf cluster , 2002, Bioinform..

[34]  Ümit V. Çatalyürek,et al.  Benchmarking short sequence mapping tools , 2013, BMC Bioinformatics.

[35]  Antonio Basílio de Miranda,et al.  Squid – a simple bioinformatics grid , 2005, BMC Bioinformatics.

[36]  P. Chain,et al.  Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. , 2012, Current opinion in biotechnology.

[37]  Jiren Wang,et al.  Soap-HT-BLAST: high throughput BLAST based on Web services , 2003, Bioinform..

[38]  Xiu Lin,et al.  Facing growth in the European Nucleotide Archive , 2012, Nucleic Acids Res..

[39]  Monzoorul Haque Mohammed,et al.  Classification of metagenomic sequences: methods and challenges , 2012, Briefings Bioinform..

[40]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .