Facilitating accessible, rapid, and appropriate processing of ancient metagenomic data with AMDirT

Background: Access to sample-level metadata is important when selecting public metagenomic sequencing datasets for reuse in new biological analyses. The Standards, Precautions, and Advances in Ancient Metagenomics community (SPAAM, https://spaam-community.github.io) has previously published AncientMetagenomeDir, a collection of curated and standardised sample metadata tables for metagenomic and microbial genome datasets generated from ancient samples. However, while sample-level information is useful for identifying relevant samples for inclusion in new projects, Next Generation Sequencing (NGS) library construction and sequencing metadata are also essential for appropriately reprocessing ancient metagenomic data. Currently, recovering information for downloading and preparing such data is difficult when laboratory and bioinformatic metadata is heterogeneously recorded in prose-based publications.  Methods: Through a series of community-based hackathon events, AncientMetagenomeDir was updated to provide standardised library-level metadata of existing and new ancient metagenomic samples. In tandem, the companion tool 'AMDirT' was developed to facilitate automated metadata curation and data validation, as well as rapid data filtering and downloading.  Results: AncientMetagenomeDir was extended to include standardised metadata of over 5000 ancient metagenomic libraries. The companion tool 'AMDirT' provides both graphical- and command-line interface based access to such metadata for users from a wide range of computational backgrounds. We also report on errors with metadata reporting that appear to commonly occur during data upload and provide suggestions on how to improve the quality of data sharing by the community. Conclusions: Together, both standardised metadata and tooling will help towards easier incorporation and reuse of public ancient metagenomic datasets into future analyses.

[1]  P. Unneberg,et al.  aMeta: an accurate and memory-efficient ancient metagenomic profiling workflow , 2022, bioRxiv.

[2]  S. Nahnsen,et al.  nf-core/mag: a best-practice pipeline for metagenome hybrid assembly and binning , 2021, bioRxiv.

[3]  C. Warinner,et al.  Ancient DNA analysis , 2021, Nature Reviews Methods Primers.

[4]  Rodrigo Lopez,et al.  The European Nucleotide Archive in 2020 , 2020, Nucleic Acids Res..

[5]  Antonio Fernandez-Guerra,et al.  Community-curated and standardised metadata of published ancient metagenomic samples with AncientMetagenomeDir , 2020, Scientific Data.

[6]  Alexander Peltzer,et al.  Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager , 2020, bioRxiv.

[7]  Katherine Eaton,et al.  NCBImeta: efficient and comprehensive metadata retrieval from NCBI databases , 2020, J. Open Source Softw..

[8]  Kohske Takahashi,et al.  Welcome to the Tidyverse , 2019, J. Open Source Softw..

[9]  Brent S. Pedersen,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[10]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[11]  Paolo Manghi,et al.  Accessible, curated metagenomic data through ExperimentHub , 2017, Nature Methods.

[12]  B. Hurwitz,et al.  Protocols.io: Virtual Communities for Protocol Development and Discussion , 2016, PLoS biology.

[13]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[14]  Hadley Wickham,et al.  Tools for Working with URLs and HTTP , 2016 .

[15]  Daniela Luzi,et al.  When Data Sharing Gets Close to 100%: What Human Paleogenetics Can Teach the Open Science Movement , 2014, PloS one.

[16]  Aurélien Ginolhac,et al.  Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX , 2014, Nature Protocols.

[17]  Jeroen Ooms,et al.  The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects , 2014, ArXiv.

[18]  A. Millard,et al.  Conventions for Reporting Radiocarbon Determinations , 2014, Radiocarbon.

[19]  Jesse Dabney,et al.  Ancient DNA damage. , 2013, Cold Spring Harbor perspectives in biology.

[20]  Emily S. Charlson,et al.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications , 2011, Nature Biotechnology.