Skyhawk: An Artificial Neural Network-based discriminator for reviewing clinically significant genomic variants

Motivation Many rare diseases and cancers are fundamentally diseases of the genome. In the past several years, genome sequencing has become one of the most important tools in clinical practice for rare disease diagnosis and targeted cancer therapy. However, variant interpretation remains the bottleneck as is not yet automated and may take a specialist several hours of work per patient. On average, one-fifth of this time is spent on visually confirming the authenticity of the candidate variants. Results We developed Skyhawk, an artificial neural network-based discriminator that mimics the process of expert review on clinically significant genomics variants. Skyhawk runs in less than one minute to review ten thousand variants, and among the false positive singletons identified by GATK Haplo-typeCaller, UnifiedGenotyper and 16GT in the HG005 GIAB sample, 79.7% were rejected by Skyhawk. Availability Skyhawk is easy to use and freely available at https://github.com/aquaskyline/Skyhawk Contact rbluo@cs.hku.hk Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[2]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[3]  Benjamin S. Glicksberg,et al.  Development and clinical application of an integrative genomic approach to personalized cancer therapy , 2016, Genome Medicine.

[4]  Michael C. Schatz,et al.  16GT: a fast and sensitive variant caller using a 16-genotype probabilistic model , 2017, bioRxiv.

[5]  Michael C. Schatz,et al.  Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing , 2018, bioRxiv.

[6]  Ruibang Luo,et al.  A multi-task convolutional deep neural network for variant calling in single molecule sequencing , 2019, Nature Communications.

[7]  S. Oliver,et al.  Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes , 2017, GigaScience.

[8]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[9]  Lily Hoffman-Andrews The known unknown: the challenges of genetic variants of uncertain significance in clinical practice , 2017, Journal of law and the biosciences.

[10]  Martin Dugas,et al.  VIPER: a web application for rapid expert review of variant calls , 2018, Bioinform..

[11]  James T. Robinson,et al.  Variant Review with the Integrative Genomics Viewer. , 2017, Cancer research.

[12]  Joshua L. Deignan,et al.  ACMG clinical laboratory standards for next-generation sequencing , 2013, Genetics in Medicine.

[13]  H. Hakonarson,et al.  Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing , 2013, Genome Medicine.

[14]  M. Taniguchi Single-Molecule Sequencing , 2016 .

[15]  Alexa B. R. McIntyre,et al.  Extensive sequencing of seven human genomes to characterize benchmark reference materials , 2015, Scientific Data.

[16]  Nicholas Katsanis,et al.  Molecular genetic testing and the future of clinical genomics , 2013, Nature Reviews Genetics.