No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics

The current state of much of the Wuhan pneumonia virus (COVID-19) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies, and requires unimpeded access to data, analysis tools, and computational infrastructure. Here we show that community efforts in developing open analytical software tools over the past ten years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner. Specifically, we use all COVID-19 genomic data available in the public domain so far to (1) underscore the importance of access to raw data and to (2) demonstrate that existing community efforts in curation and deployment of biomedical software can reliably support rapid, reproducible research during global health crises. All our analyses are fully documented at https://github.com/galaxyproject/SARS-CoV-2.

[1]  Kai Zhao,et al.  A pneumonia outbreak associated with a new coronavirus of probable bat origin , 2020, Nature.

[2]  F. Dimaio,et al.  Cryo-electron microscopy structure of a coronavirus spike glycoprotein trimer , 2016, Nature.

[3]  R. Parker Novel coronavirus (2019-nCoV) , 2020 .

[4]  Jesse J. Salk,et al.  Detection of ultra-rare mutations by next-generation sequencing , 2012, Proceedings of the National Academy of Sciences.

[5]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[6]  Sven Rahmann,et al.  Genome analysis , 2022 .

[7]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[8]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[9]  S. Lindstrom,et al.  First Case of 2019 Novel Coronavirus in the United States , 2020, The New England journal of medicine.

[10]  Félix A. Rey,et al.  Central ions and lateral asparagine/glutamine zippers stabilize the post-fusion hairpin conformation of the SARS coronavirus spike glycoprotein☆ , 2005, Virology.

[11]  A. Lapedes,et al.  Timing the ancestor of the HIV-1 pandemic strains. , 2000, Science.

[12]  David Posada,et al.  Automated phylogenetic detection of recombination using a genetic algorithm. , 2006, Molecular biology and evolution.

[13]  Rolf Backofen,et al.  Jupyter and Galaxy: Easing entry barriers into complex data analyses for biomedical researchers , 2017, PLoS Comput. Biol..

[14]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[15]  Ryan R. Wick,et al.  Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads , 2016, bioRxiv.

[16]  E. Holmes,et al.  A new coronavirus associated with human respiratory disease in China , 2020, Nature.

[17]  E. Holmes,et al.  Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding , 2020, The Lancet.

[18]  John Chilton,et al.  No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics , 2020, PLoS pathogens.

[19]  Anthony Bretaudeau,et al.  Community-driven data analysis training for biology , 2017, bioRxiv.

[20]  Michael R. Crusoe,et al.  Common Workflow Language , 2015 .