GeoBoost: accelerating research involving the geospatial metadata of virus GenBank records

Summary GeoBoost is a command-line software package developed to address sparse or incomplete metadata in GenBank sequence records that relate to the location of the infected host (LOIH) of viruses. Given a set of GenBank accession numbers corresponding to virus GenBank records, GeoBoost extracts, integrates and normalizes geographic information reflecting the LOIH of the viruses using integrated information from GenBank metadata and related full-text publications. In addition, to facilitate probabilistic geospatial modeling, GeoBoost assigns probability scores for each possible LOIH. Availability and implementation Binaries and resources required for running GeoBoost are packed into a single zipped file and freely available for download at https://tinyurl.com/geoboost. A video tutorial is included to help users quickly and easily install and run the software. The software is implemented in Java 1.8, and supported on MS Windows and Linux platforms. Contact gragon@upenn.edu. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Hjalmar S. Kühl,et al.  A world of sequences: can we use georeferenced nucleotide databases for a robust automated phylogeography? , 2017 .

[2]  Kirsten A. Duda,et al.  Global spread of dengue virus types: mapping the 70 year history , 2014, Trends in microbiology.

[3]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[4]  Derek Gatherer,et al.  Tempus et Locus: a tool for extracting precisely dated viral sequences from GenBank, and its application to the phylogenetics of primate erythroparvovirus 1 (B19V) , 2016, bioRxiv.

[5]  Indra Neil Sarkar,et al.  Leveraging biomedical ontologies and annotation services to organize microbiome data from Mammalian hosts. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[6]  Rachel Beard,et al.  Combining phylogeography and spatial epidemiology to uncover predictors of H5N1 influenza A virus diffusion , 2014, Archives of Virology.

[7]  W. Fitch,et al.  Influenza A H5N1 Immigration Is Filtered Out at Some International Borders , 2008, PloS one.

[8]  Robert Rivera,et al.  A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records , 2016, J. Am. Medical Informatics Assoc..

[9]  Kei-Hoi Cheung,et al.  Enhancing phylogeography by improving geographical information from GenBank , 2011, J. Biomed. Informatics.

[10]  Elizabeth S. Chen,et al.  Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies , 2011, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[11]  Robert Rivera,et al.  Knowledge-driven geospatial location resolution for phylogeographic models of virus migration , 2015, Bioinform..