MetaPathways v2.5: quantitative functional, taxonomic and usability improvements

Summary: Next-generation sequencing is producing vast amounts of sequence information from natural and engineered ecosystems. Although this data deluge has an enormous potential to transform our lives, knowledge creation and translation need software applications that scale with increasing data processing and analysis requirements. Here, we present improvements to MetaPathways, an annotation and analysis pipeline for environmental sequence information that expedites this transformation. We specifically address pathway prediction hazards through integration of a weighted taxonomic distance and enable quantitative comparison of assembled annotations through a normalized read-mapping measure. Additionally, we improve LAST homology searches through BLAST-equivalent E-values and output formats that are natively compatible with prevailing software applications. Finally, an updated graphical user interface allows for keyword annotation query and projection onto user-defined functional gene hierarchies, including the Carbohydrate-Active Enzyme database. Availability and implementation: MetaPathways v2.5 is available on GitHub: http://github.com/hallamlab/metapathways2. Contact: shallam@mail.ubc.ca Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Peter D. Karp,et al.  The EcoCyc and MetaCyc databases , 2000, Nucleic Acids Res..

[2]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[3]  Kishori M. Konwar,et al.  MetaPathways v2.0: A master-worker model for environmental Pathway/Genome Database construction on grids and clouds , 2014, 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology.

[4]  M. Frith,et al.  Adaptive seeds tame genomic sequence comparison. , 2011, Genome research.

[5]  Peter D Karp,et al.  Metabolic pathways for the whole community , 2014, BMC Genomics.

[6]  Kishori M. Konwar,et al.  MetaPathways: a modular pipeline for constructing pathway/genome databases from environmental sequence information , 2013, BMC Bioinformatics.

[7]  T. Scheffer,et al.  Taxonomic metagenome sequence assignment with structured output models , 2011, Nature Methods.

[8]  Brandi L. Cantarel,et al.  The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics , 2008, Nucleic Acids Res..

[9]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[10]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[11]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[12]  Peter D. Karp,et al.  The EcoCyc Database , 2002, Nucleic Acids Res..

[13]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[14]  Peter D. Karp,et al.  The Pathway Tools software , 2002, ISMB.

[15]  Peter D. Karp,et al.  Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology , 2015, Briefings Bioinform..

[16]  S. Altschul,et al.  Optimal sequence alignment using affine gap costs. , 1986, Bulletin of mathematical biology.