Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research

Abstract SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de

[1]  Manja Marz,et al.  PoSeiDon: a Nextflow pipeline for the detection of evolutionary recombination events and positive selection , 2020, bioRxiv.

[2]  Edward C. Holmes,et al.  A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology , 2020, Nature Microbiology.

[3]  Dominik Heider,et al.  CORDITE: The Curated CORona Drug InTERactions Database for SARS-CoV-2 , 2020, iScience.

[4]  Robert D. Finn,et al.  COVID-19 pandemic reveals the peril of ignoring metadata standards , 2020, Scientific Data.

[5]  Karin J. Metzner,et al.  V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput sequencing data , 2020, bioRxiv.

[6]  Manja Marz,et al.  PoSeiDon: a Nextflow pipeline for the detection of evolutionary recombination events and positive selection , 2020, bioRxiv.

[7]  Hiroaki Kitano,et al.  COVID-19 Disease Map, building a computational repository of SARS-CoV-2 virus-host interaction mechanisms , 2020, Scientific Data.

[8]  Benjamin J. Polacco,et al.  A SARS-CoV-2 Protein Interaction Map Reveals Targets for Drug-Repurposing , 2020, Nature.

[9]  D. Montefiori,et al.  Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2 , 2020, bioRxiv.

[10]  David B. Blumenthal,et al.  Exploring the SARS-CoV-2 virus-host-drug interactome for drug repurposing , 2020, Nature Communications.

[11]  Edward C. Holmes,et al.  A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology , 2020, bioRxiv.

[12]  M. Borca,et al.  Positive selection of ORF3a and ORF8 genes drives the evolution of SARS-CoV-2 during the 2020 COVID-19 pandemic , 2020, bioRxiv.

[13]  V. Navratil,et al.  The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) envelope (E) protein harbors a conserved BH3-like sequence , 2020, bioRxiv.

[14]  Michael T. Monaghan,et al.  PriSeT: Efficient De Novo Primer Discovery , 2020, bioRxiv.

[15]  Michael Meyer-Hermann,et al.  Estimate of the development of the epidemic reproduction number Rt from Coronavirus SARS-CoV-2 case data and implications for political measures based on prognostics , 2020, medRxiv.

[16]  X. Rodó,et al.  A modified SEIR model to predict the COVID-19 outbreak in Spain and Italy: Simulating control scenarios and multi-scale epidemics , 2020, Results in Physics.

[17]  H. Westerhoff,et al.  Advice from a systems-biology model of the corona epidemics , 2020, npj Systems Biology and Applications.

[18]  Weidong Wu,et al.  Virology, Epidemiology, Pathogenesis, and Control of COVID-19 , 2020, Viruses.

[19]  J. Quick,et al.  nCoV-2019 sequencing protocol v1 (protocols.io.bd8mi9u6) , 2020, protocols.io.

[20]  Anne-Mieke Vandamme,et al.  A prospect on the use of antiviral drugs to control local outbreaks of COVID-19 , 2020, BMC Medicine.

[21]  S. Rampelli,et al.  Retrospective Search for SARS-CoV-2 in Human Faecal Metagenomes , 2020 .

[22]  M. Beadsworth,et al.  Amplicon based MinION sequencing of SARS-CoV-2 and metagenomic characterisation of nasopharyngeal swabs from patients with COVID-19 , 2020, medRxiv.

[23]  A. Walls,et al.  Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein , 2020, Cell.

[24]  Xinchun Chen,et al.  Co-infections of SARS-CoV-2 with multiple common respiratory pathogens in infected patients , 2020, Science China Life Sciences.

[25]  G. Herrler,et al.  SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor , 2020, Cell.

[26]  Sven Nahnsen,et al.  The nf-core framework for community-curated bioinformatics pipelines , 2020, Nature Biotechnology.

[27]  E. Holmes,et al.  A new coronavirus associated with human respiratory disease in China , 2020, Nature.

[28]  Victor M Corman,et al.  Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR , 2020, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[29]  Robert D. Finn,et al.  MGnify: the microbiome analysis resource in 2020 , 2019, Nucleic Acids Res..

[30]  Feng Zhu,et al.  Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics , 2019, Nucleic Acids Res..

[31]  J. Luban SARS-CoV-2 , 2020 .

[32]  Eric P. Nawrocki,et al.  VADR: validation and annotation of virus sequence submissions to GenBank , 2019, bioRxiv.

[33]  Olga Chernomor,et al.  IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era , 2019, bioRxiv.

[34]  U. Reichl,et al.  Production of Defective Interfering Particles of Influenza A Virus in Parallel Continuous Cultures at Two Residence Times—Insights From qPCR Measurements and Viral Dynamics Modeling , 2019, Front. Bioeng. Biotechnol..

[35]  Raul Rabadan,et al.  A Structure-Informed Atlas of Human-Virus Interactions , 2019, Cell.

[36]  P. Masters,et al.  Coronavirus genomic RNA packaging , 2019, Virology.

[37]  Luis U. Aguilera,et al.  Modeling the effect of tat inhibitors on HIV latency. , 2019, Journal of theoretical biology.

[38]  Alice C. McHardy,et al.  Phylogeographic reconstruction using air transportation data and its application to the 2009 H1N1 influenza A pandemic , 2019, bioRxiv.

[39]  Guangdi Li,et al.  Drivers of HIV-1 transmission: The Portuguese case , 2019, bioRxiv.

[40]  Massimiliano Izzo,et al.  FAIRsharing as a community approach to standards, repositories and policies , 2019, Nature Biotechnology.

[41]  Duygu Dikicioglu,et al.  Dynamic modelling of the killing mechanism of action by virus-infected yeasts , 2019, Journal of the Royal Society Interface.

[42]  Igor Jurisica,et al.  IID 2018 update: context-specific physical protein–protein interactions in human, model organisms and domesticated species , 2018, Nucleic Acids Res..

[43]  Andrew R. Leach,et al.  ChEMBL: towards direct deposition of bioassay data , 2018, Nucleic Acids Res..

[44]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[45]  Tudor I. Oprea,et al.  DrugCentral 2018: an update , 2018, Nucleic Acids Res..

[46]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[47]  Sebastián Duchêne,et al.  BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis , 2018, bioRxiv.

[48]  Reza Yaesoubi,et al.  Accurate quantification of uncertainty in epidemic parameter estimates and predictions using stochastic compartmental models , 2018, Statistical methods in medical research.

[49]  Chris Upton,et al.  Base-By-Base Version 3: New Comparative Tools for Large Virus Genomes , 2018, Viruses.

[50]  Gurudeeban Selvaraj,et al.  Designing of CD8+ and CD8+-overlapped CD4+ epitope vaccine by targeting late and early proteins of human papillomavirus , 2018, Biologics : targets & therapy.

[51]  Kristof Theys,et al.  VIRULIGN: fast codon-correct alignment and annotation of viral genomes , 2018, bioRxiv.

[52]  Lila Kari,et al.  An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes , 2018, bioRxiv.

[53]  Shalini Sharma,et al.  Virological and Immunological Outcomes of Coinfections , 2018, Clinical Microbiology Reviews.

[54]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[55]  Ali Akgül,et al.  Mathematical Model for the Ebola Virus Disease , 2018, Journal of Advanced Physics.

[56]  David A. Rasmussen,et al.  Estimating Epidemic Incidence and Prevalence from Genomic Data , 2018, bioRxiv.

[57]  Joanne Kelly Barry Mathematical Modelling of the HIV Life Cycle: Identifying Optimal Treatment Strategies , 2018 .

[58]  Joshua B. Singer,et al.  GLUE: a flexible software system for virus sequence data , 2018, BMC Bioinformatics.

[59]  Igor Siveroni,et al.  Bayesian phylodynamic inference with complex models , 2018, bioRxiv.

[60]  J. Ziebuhr,et al.  Structural and functional conservation of cis-acting RNA elements in coronavirus 5'-terminal genome regions , 2017, Virology.

[61]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[62]  Robert D. Finn,et al.  Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families , 2017, Nucleic Acids Res..

[63]  Sven Rahmann,et al.  Genome analysis , 2022 .

[64]  Trevor Bedford,et al.  Nextstrain: real-time tracking of pathogen evolution , 2017, bioRxiv.

[65]  Nicola De Maio,et al.  Bayesian reconstruction of transmission within outbreaks using genomic variants , 2017, bioRxiv.

[66]  Kristof Theys,et al.  Characterization of Nucleoside Reverse Transcriptase Inhibitor-Associated Mutations in the RNase H Region of HIV-1 Subtype C Infected Individuals , 2017, Viruses.

[67]  A. Nowé,et al.  Exploring resistance pathways for first-generation NS3/4A protease inhibitors boceprevir and telaprevir using Bayesian network learning. , 2017, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[68]  Kristof Theys,et al.  Zika genomics urgently need standardized and curated reference sequences , 2017, PLoS pathogens.

[69]  Guy Baele,et al.  PhyloGeoTool: interactively exploring large phylogenies in an epidemiological context , 2017, Bioinform..

[70]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[71]  Yuelong Shu,et al.  GISAID: Global initiative on sharing all influenza data – from vision to reality , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[72]  Maria Jesus Martin,et al.  ProtVista: visualization of protein sequence annotations , 2017, Bioinform..

[73]  Trevor Bedford,et al.  Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples , 2017, Nature Protocols.

[74]  Christoph Zimmer,et al.  A Likelihood Approach for Real-Time Calibration of Stochastic Compartmental Epidemic Models , 2017, PLoS Comput. Biol..

[75]  Stefan Elbe,et al.  Data, disease and diplomacy: GISAID's innovative contribution to global health , 2017, Global challenges.

[76]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[77]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[78]  A. Drummond,et al.  Inferring Ancestral Recombination Graphs from Bacterial Genomic Data , 2016, Genetics.

[79]  Lei Deng,et al.  A computational interactome and functional annotation for the human proteome , 2016, eLife.

[80]  Kristof Theys,et al.  Mapping the genomic diversity of HCV subtypes 1a and 1b: Implications of structural and immunological constraints for vaccine and drug development , 2016, Virus evolution.

[81]  Tanja Stadler,et al.  Phylodynamics with Migration: A Computational Framework to Quantify Population Structure from Genomic Data , 2016, Molecular biology and evolution.

[82]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[83]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[84]  Michael K. Gilson,et al.  BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology , 2015, Nucleic Acids Res..

[85]  J. Leibowitz,et al.  The structure and functions of coronavirus genomic 3′ and 5′ ends , 2015, Virus Research.

[86]  Gary R. Whittaker,et al.  Host cell proteases: Critical determinants of coronavirus tropism and pathogenesis , 2014, Virus Research.

[87]  Vincent Navratil,et al.  VirHostNet 2.0: surfing on the web of virus/host molecular interactions data , 2014, Nucleic Acids Res..

[88]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[89]  Kunihiko Sadakane,et al.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph , 2014, Bioinform..

[90]  Roland Eils,et al.  circlize implements and enhances circular visualization in R , 2014, Bioinform..

[91]  Stuart A. Wilson,et al.  Competitive and Cooperative Interactions Mediate RNA Transfer from Herpesvirus Saimiri ORF57 to the Mammalian Export Adaptor ALYREF , 2014, PLoS pathogens.

[92]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[93]  Tanja Stadler,et al.  Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth–death SIR model , 2013, Journal of The Royal Society Interface.

[94]  D. Helbing,et al.  The Hidden Geometry of Complex, Network-Driven Contagion Phenomena , 2013, Science.

[95]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[96]  Jean Ruelle,et al.  RegaDB: community-driven data management and analysis for infectious diseases , 2013, Bioinform..

[97]  Minh Anh Nguyen,et al.  Ultrafast Approximation for Phylogenetic Bootstrap , 2013, Molecular biology and evolution.

[98]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[99]  S. Bonhoeffer,et al.  Birth–death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV) , 2012, Proceedings of the National Academy of Sciences.

[100]  Sven Rahmann,et al.  Snakemake--a scalable bioinformatics workflow engine. , 2012, Bioinformatics.

[101]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[102]  A. Gamarnik,et al.  Novel ATP-Independent RNA Annealing Activity of the Dengue Virus NS3 Helicase , 2012, PloS one.

[103]  J. Dequach,et al.  Human cardiomyogenesis and the need for systems biology analysis , 2011, Wiley interdisciplinary reviews. Systems biology and medicine.

[104]  Christian Drosten,et al.  The SARS-Coronavirus-Host Interactome: Identification of Cyclophilins as Target for Pan-Coronavirus Inhibitors , 2011, PLoS pathogens.

[105]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[106]  I. Sola,et al.  RNA-RNA and RNA-protein interactions in coronavirus replication and transcription , 2011, RNA biology.

[107]  C. Upton,et al.  Base-By-Base version 2: single nucleotide-level analysis of whole viral genome alignments , 2011, Microbial Informatics and Experimentation.

[108]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[109]  Sergei L. Kosakovsky Pond,et al.  Phylodynamics of Infectious Disease Epidemics , 2009, Genetics.

[110]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[111]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[112]  I M Kapetanovic,et al.  Computer-aided drug discovery and development (CADDD): in silico-chemico-biological approach. , 2008, Chemico-biological interactions.

[113]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[114]  Mudita Singhal,et al.  COPASI - a COmplex PAthway SImulator , 2006, Bioinform..

[115]  Vasily Tcherepanov,et al.  Genome Annotation Transfer Utility (GATU): rapid annotation of viral genomes using a closely related reference genome , 2006, BMC Genomics.

[116]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[117]  Gisbert Schneider,et al.  Computer-based de novo design of drug-like molecules , 2005, Nature Reviews Drug Discovery.

[118]  S. Goebel,et al.  The 3′ cis-Acting Genomic Replication Element of the Severe Acute Respiratory Syndrome Coronavirus Can Function in the Murine Coronavirus Genome , 2004, Journal of Virology.

[119]  Chris Upton,et al.  Base-By-Base: Single nucleotide-level analysis of whole viral genome alignments , 2004, BMC Bioinformatics.

[120]  G. Navis,et al.  Tissue distribution of ACE2 protein, the functional receptor for SARS coronavirus. A first step in understanding SARS pathogenesis , 2004, The Journal of pathology.

[121]  O. Pybus,et al.  Unifying the Epidemiological and Evolutionary Dynamics of Pathogens , 2004, Science.

[122]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[123]  John L. Sullivan,et al.  Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus , 2003, Nature.

[124]  Chris Upton,et al.  Poxvirus Orthologous Clusters (POCs) , 2002, Bioinform..

[125]  C Upton,et al.  Viral genome organizer: a system for analyzing complete viral genomes. , 2000, Virus research.

[126]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[127]  D. Steinhauer,et al.  Role of hemagglutinin cleavage for the pathogenicity of influenza virus. , 1999, Virology.

[128]  J. Waner,et al.  Mixed viral infections: detection and management , 1994, Clinical Microbiology Reviews.

[129]  H. Klenk,et al.  Host cell proteases controlling virus pathogenicity. , 1994, Trends in microbiology.

[130]  W. O. Kermack,et al.  A contribution to the mathematical theory of epidemics , 1927 .