ngsLCA—A toolkit for fast and flexible lowest common ancestor inference and taxonomic profiling of metagenomic data

Metagenomic data generated from environmental samples is increasingly common in the analysis of modern and ancient biological communities. To obtain taxonomic profiles from this type of data, DNA sequences are aligned against large genomic reference databases and the lowest common ancestor (LCA) needs to be inferred for each sequence with multiple alignments. To date, efforts have mainly focused on improving the speed, sensitivity and specificity of alignment tools, and little effort has been applied to the LCA algorithm that generates the taxonomic profiles from alignments. We present ngsLCA, a command‐line toolkit with two separate modules: the main program (in C/C++) performing LCA inference, and an R package for generating tables and visualisations of the taxonomic profiles. ngsLCA processed large datasets in BAM/SAM alignment format 4–11 times faster and used less memory compared to other available programs. It is compatible with the NCBI taxonomy and has flexible parameter settings. Furthermore, the toolkit offers functions for filtering, contamination removal, taxonomic clustering, and multiple ways of visualising the generated taxonomic profiles. ngsLCA bridges a gap in current metagenomic analyses by supplying a computationally light, easy‐to‐use, accurate, fast and flexible LCA algorithm with R functions for processing and illustrating the taxonomic profiles

[1]  C. Warinner,et al.  sam2lca: Lowest Common Ancestor for SAM/BAM/CRAM alignment files , 2022, J. Open Source Softw..

[2]  J. M. Macpherson,et al.  Comparing whole‐genome shotgun sequencing and DNA metabarcoding approaches for species identification and quantification of pollen species mixtures , 2021, Ecology and evolution.

[3]  D. J. Meltzer,et al.  Late Quaternary dynamics of Arctic biota from ancient environmental genomics , 2021, Nature.

[4]  Fatemeh Almodaresi,et al.  PuffAligner: a fast, efficient and accurate aligner based on the Pufferfish index , 2021, Bioinform..

[5]  S. Boessenkool,et al.  Metagenomics: A viable tool for reconstructing herbivore diet , 2021, Molecular ecology resources.

[6]  David López Herráez,et al.  Unearthing Neanderthal population history using nuclear and mitochondrial DNA from cave sediments , 2021, Science.

[7]  H. Drost,et al.  Sensitive protein alignments at tree-of-life scale using DIAMOND , 2021, Nature Methods.

[8]  Emily E. Puckett,et al.  Environmental genomics of Late Pleistocene black bears and giant short-faced bears , 2021, Current Biology.

[9]  M A Kühl,et al.  Mutation-Simulator: fine-grained simulation of random mutations in any genome , 2020, Bioinform..

[10]  P. Manghi,et al.  Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3 , 2020, bioRxiv.

[11]  A. Manica,et al.  Genome‐scale target capture of mitochondrial and nuclear environmental DNA from water samples , 2020, Molecular ecology resources.

[12]  R. Allaby,et al.  PIA: More Accurate Taxonomic Assignment of Metagenomic Data Demonstrated on sedaDNA From the North Sea , 2020, Frontiers in Ecology and Evolution.

[13]  Steven L. Salzberg,et al.  Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank , 2020, Genome Biology.

[14]  Jennifer Lu,et al.  Improved metagenomic analysis with Kraken 2 , 2019, Genome Biology.

[15]  M. V. Ramana Murthy,et al.  A review on the applications and recent advances in environmental DNA (eDNA) metagenomics , 2019, Reviews in Environmental Science and Bio/Technology.

[16]  Md Saydur Rahman,et al.  Past, present, and future perspectives of environmental DNA (eDNA) metabarcoding: A systematic review in methods, monitoring, and applications of global eDNA , 2019, Global Ecology and Conservation.

[17]  Daniel N. Baker,et al.  KrakenUniq: confident and fast metagenomics classification using unique k-mer counts , 2018, Genome Biology.

[18]  Dominique A Cowart,et al.  Metagenomic sequencing of environmental DNA reveals marine faunal assemblages from the West Antarctic Peninsula. , 2017, Marine genomics.

[19]  Kristy Deiner,et al.  Environmental DNA metabarcoding: Transforming how we survey animal and plant communities , 2017, Molecular ecology.

[20]  Eske Willerslev,et al.  gargammel: a sequence simulator for ancient DNA , 2016, Bioinform..

[21]  Roland Eils,et al.  Complex heatmaps reveal patterns and correlations in multidimensional genomic data , 2016, Bioinform..

[22]  Eske Willerslev,et al.  Postglacial viability and colonization in North America’s ice-free corridor , 2016, Nature.

[23]  Daniel H. Huson,et al.  MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data , 2016, PLoS Comput. Biol..

[24]  Daniel H. Huson,et al.  MALT: Fast alignment and analysis of metagenomic DNA sequence data applied to the Tyrolean Iceman , 2016, bioRxiv.

[25]  Anders Krogh,et al.  Fast and sensitive taxonomic classification for metagenomics with Kaiju , 2016, Nature Communications.

[26]  Eske Willerslev,et al.  Environmental DNA - An emerging tool in conservation for monitoring past and present biodiversity , 2015 .

[27]  Heng Li,et al.  Genome sequence of a 45,000-year-old modern human from western Siberia , 2014, Nature.

[28]  Marie-Pierre Ryser-Degiorgis,et al.  Wildlife health investigations: needs, challenges and recommendations , 2013, BMC Veterinary Research.

[29]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[30]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[31]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[32]  P. Dixon VEGAN, a package of R functions for community ecology , 2003 .

[33]  M R Barer,et al.  Bacterial viability and culturability. , 1999, Advances in microbial physiology.

[34]  E. Muller Mapping riparian vegetation along rivers: old concepts and new methods , 1997 .

[35]  J. Overpeck,et al.  Mapped plant macrofossil and pollen records of late Quaternary vegetation change in eastern North America , 1995 .