MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads

MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors’ knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded fromhttp://hmpdacc.org). MALINA is made freely available on the web athttp://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.

[1]  P. Bork,et al.  Enterotypes of the human gut microbiome , 2011, Nature.

[2]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[3]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[4]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[5]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[8]  Anne-Béatrice Dufour,et al.  The ade4 Package: Implementing the Duality Diagram for Ecologists , 2007 .

[9]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[10]  David S. Wishart,et al.  METAGENassist: a comprehensive web server for comparative metagenomics , 2012, Nucleic Acids Res..

[11]  I-Min A. Chen,et al.  IMG/M: a data management and analysis system for metagenomes , 2007, Nucleic Acids Res..

[12]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[13]  S. Kravitz,et al.  CAMERA: A Community Resource for Metagenomics , 2007, PLoS biology.

[14]  Peer Bork,et al.  SmashCommunity: a metagenomic annotation and analysis tool , 2010, Bioinform..

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  Sarah C. Goslee,et al.  The ecodist Package for Dissimilarity-based Analysis of Ecological Data , 2007 .

[17]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[18]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[19]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[20]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.