GenomeWarp: an alignment-based variant coordinate transformation

Abstract Summary Reference genomes are refined to reflect error corrections and other improvements. While this process improves novel data generation and analysis, incorporating data analyzed on an older reference genome assembly requires transforming the coordinates and representations of the data to the new assembly. Multiple tools exist to perform this transformation for coordinate-only data types, but none supports accurate transformation of genome-wide short variation. Here we present GenomeWarp, a tool for efficiently transforming variants between genome assemblies. GenomeWarp transforms regions and short variants in a conservative manner to minimize false positive and negative variants in the target genome, and converts over 99% of regions and short variants from a representative human genome. Availability and implementation GenomeWarp is written in Java. All source code and the user manual are freely available at https://github.com/verilylifesciences/genomewarp. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[2]  David Haussler,et al.  The UCSC genome browser and associated tools , 2012, Briefings Bioinform..

[3]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[4]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[5]  Chunlin Xiao,et al.  An open resource for accurately benchmarking small variant and reference calls , 2019, Nature Biotechnology.

[6]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[7]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Alexa B. R. McIntyre,et al.  Extensive sequencing of seven human genomes to characterize benchmark reference materials , 2015, Scientific Data.

[9]  Mauricio O. Carneiro,et al.  Scaling accurate genetic variant discovery to tens of thousands of samples , 2017, bioRxiv.

[10]  Jing Wang,et al.  CrossMap: a versatile tool for coordinate conversion between genome assemblies , 2014, Bioinform..

[11]  Birgit Funke,et al.  Best practices for benchmarking germline small-variant calls in human genomes , 2019, Nature Biotechnology.

[12]  Thomas Colthurst,et al.  A universal SNP and small-indel variant caller using deep neural networks , 2018, Nature Biotechnology.

[13]  Ohad Rodeh,et al.  GLnexus: joint variant calling for large cohort sequencing , 2018, bioRxiv.