Graph Neural Networks for Microbial Genome Recovery

Microbes have a profound impact on our health and environment, but our understanding of the diversity and function of microbial communities is severely limited. Through DNA sequencing of microbial communities (metagenomics), DNA fragments (reads) of the individual microbes can be obtained, which through assembly graphs can be com-bined into long contiguous DNA sequences (con-tigs). Given the complexity of microbial communities, single contig microbial genomes are rarely obtained. Instead, contigs are eventually clustered into bins, with each bin ideally making up a full genome. This process is referred to as metagenomic binning. autoen-coders representations We explore several types of GNNs and demonstrate that V AE G-B IN recovers more high-quality genomes than other state-of-the-art binners on both simulated and real-world datasets.

[1]  Thomas D. Nielsen,et al.  Metagenomic binning with assembly graph embeddings , 2022, bioRxiv.

[2]  Vijini Mallawaarachchi,et al.  RepBin: Constraint-based Graph Representation Learning for Metagenomic Binning , 2021, AAAI.

[3]  R. Kirkegaard,et al.  Oxford Nanopore R10.4 long-read sequencing enables near-perfect bacterial genomes from pure cultures and metagenomes without short-read or reference polishing , 2021, bioRxiv.

[4]  Aiping Lu,et al.  A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data , 2021, Computational and structural biotechnology journal.

[5]  Orkun S. Soyer,et al.  STRONG: metagenomics strain resolution on assembly graphs , 2021, Genome Biology.

[6]  P. Nielsen,et al.  Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing , 2021, Nature Communications.

[7]  Yi Yue,et al.  Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets , 2020, BMC Bioinformatics.

[8]  Vijini Mallawaarachchi,et al.  GraphBin: refined binning of metagenomic contigs using assembly graphs , 2020, Bioinform..

[9]  P. Pevzner,et al.  metaFlye: scalable long-read metagenome assembly using repeat graphs , 2019, Nature Methods.

[10]  Feng Li,et al.  MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies , 2019, PeerJ.

[11]  Edoardo Pasolli,et al.  Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle , 2019, Cell.

[12]  Alice C. McHardy,et al.  AMBER: Assessment of Metagenome BinnERs , 2017, bioRxiv.

[13]  A. Danchin,et al.  The contribution of microbial biotechnology to sustainable development goals , 2017, Microbial biotechnology.

[14]  Natalia N. Ivanova,et al.  Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea , 2017, Nature Biotechnology.

[15]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[16]  Ting Chen,et al.  COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO‐alignment and paired‐end read LinkAge , 2016, Bioinform..

[17]  Connor T. Skennerton,et al.  CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes , 2015, Genome research.

[18]  P. Hugenholtz,et al.  Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes , 2013, Nature Biotechnology.

[19]  BMC Bioinformatics , 2005 .

[20]  E. Hall,et al.  The nature of biotechnology. , 1988, Journal of biomedical engineering.

[21]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[22]  Microbiology Spectrum , 2022 .

[23]  Yi-Ming Wei,et al.  Author's Personal Copy China's Carbon Emissions from Urban and Rural Households during 1992e2007 , 2022 .