Empowering bioinformatics communities with Nextflow and nf-core

Standardised analysis pipelines are an important part of FAIR bioinformatics research. Over the last decade, there has been a notable shift from point-and-click pipeline solutions such as Galaxy towards command-line solutions such as Nextflow and Snakemake. We report on recent developments in the nf-core and Nextflow frameworks that have led to widespread adoption across many scientific communities. We describe how adopting nf-core standards enables faster development, improved interoperability, and collaboration with the >8,000 members of the nf-core community. The recent development of Nextflow Domain-Specific Language 2 (DSL2) allows pipeline components to be shared and combined across projects. The nf-core community has harnessed this with a library of modules and subworkflows that can be integrated into any Nextflow pipeline, enabling research communities to progressively transition to nf-core best practices. We present a case study of nf-core adoption by six European research consortia, grouped under the EuroFAANG umbrella and dedicated to farmed animal genomics. We believe that the process outlined in this report can inspire many large consortia to seek harmonisation of their data analysis procedures.

[1]  M. Kent,et al.  Advancing fish breeding in aquaculture through genome functional annotation , 2024, Aquaculture.

[2]  Hans H. Cheng,et al.  The ChickenGTEx pilot analysis: a reference of regulatory variants across 28 chicken tissues , 2023, bioRxiv.

[3]  Sébastien Guizard,et al.  nf-core/isoseq: simple gene and isoform annotation with PacBio Iso-Seq long-read sequencing , 2023, Bioinform..

[4]  Supercomputing,et al.  CELEBI: The CRAFT Effortless Localisation and Enhanced Burst Inspection pipeline , 2023, Astron. Comput..

[5]  Xiangdong Ding,et al.  A compendium of genetic regulatory effects across pig tissues , 2022, bioRxiv.

[6]  D. Katz,et al.  Introducing the FAIR Principles for research software , 2022, Scientific Data.

[7]  P. VanRaden,et al.  A multi-tissue atlas of regulatory variants in cattle , 2022, Nature Genetics.

[8]  A. Wilm,et al.  Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers , 2021, Nature Methods.

[9]  Peter W. Harrison,et al.  The FAANG Data Portal: Global, Open-Access, “FAIR”, and Richly Validated Genotype to Phenotype Data for High-Quality Functional Annotation of Animal Genomes , 2021, Frontiers in Genetics.

[10]  Peter W. Harrison,et al.  From FAANG to fork: application of highly annotated genomes to improve farmed animal production , 2020, Genome Biology.

[11]  Christopher D. Brown,et al.  The GTEx Consortium atlas of genetic regulatory effects across human tissues , 2019, Science.

[12]  Brent S. Pedersen,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[13]  Rolf Backofen,et al.  Practical computational reproducibility in the life sciences , 2017, bioRxiv.

[14]  Piero Carninci,et al.  The FANTOM5 collection, a data series underpinning mammalian transcriptome atlases in diverse cell types , 2017, Scientific Data.

[15]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[16]  Harald Barsnes,et al.  BioContainers: an open-source and community-driven framework for software standardization , 2017, Bioinform..

[17]  Måns Magnusson,et al.  MultiQC: summarize analysis results for multiple tools and samples in a single report , 2016, Bioinform..

[18]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[19]  Hans H. Cheng,et al.  Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project , 2015, Genome Biology.

[20]  N. Karacapilidis,et al.  Predicting prices of Airbnb listings via Graph Neural Networks and Document Embeddings: The case of the island of Santorini , 2022, CENTERIS/ProjMAN/HCist.

[21]  Larry Winner Applied Statistical Methods , 2022, Springer Proceedings in Mathematics & Statistics.

[22]  U. Leser,et al.  FORCE on Nextflow: Scalable Analysis of Earth Observation Data on Commodity Clusters , 2021, CIKM Workshops.

[23]  Lavanya Ramakrishnan,et al.  The future of scientific workflows , 2018, Int. J. High Perform. Comput. Appl..

[24]  A. Melser POSTGRADUATE PROGRAM. , 1965, Outlook and bulletin. Southern Dental Society of New Jersey.

[25]  A. Sahu An Integrated Encyclopedia of DNA Elements in the Human Genome , 2022 .