MAIRA- real-time taxonomic and functional analysis of long reads on a laptop

Background Advances in mobile sequencing devices and laptop performance make metagenomic sequencing and analysis in the field a technologically feasible prospect. However, metagenomic analysis pipelines are usually designed to run on servers and in the cloud. Results MAIRA is a new standalone program for interactive taxonomic and functional analysis of long read metagenomic sequencing data on a laptop, without requiring external resources. The program performs fast, online, genus-level analysis, and on-demand, detailed taxonomic and functional analysis. It uses two levels of frame-shift-aware alignment of DNA reads against protein reference sequences, and then performs detailed analysis using a protein synteny graph. Conclusions We envision this software being used by researchers in the field, when access to servers or cloud facilities is difficult, or by individuals that do not routinely access such facilities, such as medical researchers, crop scientists, or teachers.

[1]  Martin C. Frith,et al.  Frameshift alignment: statistics and post-genomic applications , 2014, Bioinform..

[2]  Justin Chu,et al.  NanoSim: nanopore sequence read simulator based on statistical characterization , 2016, bioRxiv.

[3]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[4]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[5]  S. Salzberg,et al.  Centrifuge: rapid and sensitive classification of metagenomic sequences , 2016, bioRxiv.

[6]  Meghan Coakley McCarthy,et al.  Nephele: a cloud platform for simplified, standardized and reproducible microbiome data analysis , 2018, Bioinform..

[7]  Shuiquan Tang,et al.  Ultra-deep, long-read nanopore sequencing of mock microbial community standards , 2018 .

[8]  Rohan B. H. Williams,et al.  Annotated bacterial chromosomes from frame-shift-corrected long-read metagenomic data , 2019, Microbiome.

[9]  R. Neher,et al.  Resolving structural diversity of Carbapenemase-producing gram-negative bacteria using single molecule sequencing , 2018, bioRxiv.

[10]  David A. Matthews,et al.  Real-time, portable genome sequencing for Ebola surveillance , 2016, Nature.

[11]  Irina Bessarab,et al.  MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs , 2018, Biology Direct.

[12]  Eugene Kulesha,et al.  What’s in my pot? Real-time species identification on the MinION™ , 2015, bioRxiv.

[13]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences: current status, policy and new initiatives , 2008, Nucleic Acids Res..

[14]  Raymond Lo,et al.  CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database , 2016, Nucleic Acids Res..

[15]  Daniel H. Huson,et al.  MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data , 2016, PLoS Comput. Biol..

[16]  Joshua Lederberg,et al.  Microbial Threats to Health: Emergence, Detection, and Response , 2003 .

[17]  Jun Yu,et al.  VFDB: a reference database for bacterial virulence factors , 2004, Nucleic Acids Res..

[18]  Andrea D. Tyler,et al.  Evaluation of Oxford Nanopore’s MinION Sequencing Device for Microbial Whole Genome Sequencing Applications , 2018, Scientific Reports.

[19]  P. F. Vasconcelos,et al.  In situ immune response and mechanisms of cell damage in central nervous system of fatal cases microcephaly by Zika virus , 2018, Scientific Reports.

[20]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.