SVFX: a machine learning framework to quantify the pathogenicity of structural variants

There is a lack of approaches for identifying pathogenic genomic structural variants (SVs) although they play a crucial role in many diseases. We present a mechanism-agnostic machine learning-based workflow, called SVFX, to assign pathogenicity scores to somatic and germline SVs. In particular, we generate somatic and germline training models, which include genomic, epigenomic, and conservation-based features, for SV call sets in diseased and healthy individuals. We then apply SVFX to SVs in cancer and other diseases; SVFX achieves high accuracy in identifying pathogenic SVs. Predicted pathogenic SVs in cancer cohorts are enriched among known cancer genes and many cancer-related pathways.

[1]  Eric S. Lander,et al.  Mapping and characterization of structural variation in 17,795 human genomes , 2020, Nature.

[2]  Nhgri Centers for Common Disease Genomics Mapping and Characterization of Structural Variation in 17,795 Human Genomes , 2020 .

[3]  Tariq Ahmad,et al.  A structural variation reference for medical and population genetics , 2020, Nature.

[4]  Nuno A. Fonseca,et al.  Patterns of somatic structural variation in human cancer genomes , 2020, Nature.

[5]  Grace Tiao,et al.  An open resource of structural variation for medical and population genetics , 2019 .

[6]  Brian E. Cade,et al.  Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program , 2019, Nature.

[7]  A. Abyzov,et al.  Chromatin organization modulates the origin of heritable structural variations in human genome , 2019, Nucleic acids research.

[8]  Evan E. Eichler,et al.  Characterizing the Major Structural Variant Alleles of the Human Genome , 2019, Cell.

[9]  Katherine S. Pollard,et al.  Chromatin features constrain structural variation across evolutionary timescales , 2018, Proceedings of the National Academy of Sciences.

[10]  Ryan M. Layer,et al.  Mapping and characterization of structural variation in 17,795 deeply sequenced human genomes , 2018, bioRxiv.

[11]  Melissa J Landrum,et al.  ClinVar at five years: Delivering on the promise , 2018, Human mutation.

[12]  R. Hardison,et al.  The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions , 2018, Genome Biology.

[13]  Li Ding,et al.  Comprehensive Characterization of Cancer Driver Genes and Mutations (vol 173, 371.e1, 2018) , 2018 .

[14]  S. Mundlos,et al.  Structural variation in the 3D genome , 2018, Nature Reviews Genetics.

[15]  Lilia M. Iakoucheva,et al.  Paternally inherited cis-regulatory structural variants are associated with autism , 2018, Science.

[16]  Steven J. M. Jones,et al.  Comprehensive Characterization of Cancer Driver Genes and Mutations , 2018, Cell.

[17]  H. Moradi,et al.  Inhibition of intestinal ascorbic acid uptake by lipopolysaccharide is mediated via transcriptional mechanisms. , 2018, Biochimica et biophysica acta. Biomembranes.

[18]  Ryan L. Collins,et al.  Multi-platform discovery of haplotype-resolved structural variation in human genomes , 2017, bioRxiv.

[19]  Nicola D. Roberts,et al.  Selective and mechanistic sources of recurrent rearrangements across the cancer genome , 2017, bioRxiv.

[20]  Jan O. Korbel,et al.  Patterns of structural variation in human cancer , 2017, bioRxiv.

[21]  John D McPherson,et al.  Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line , 2017, bioRxiv.

[22]  Icgc,et al.  Pan-cancer analysis of whole genomes , 2017, bioRxiv.

[23]  J. Colgan,et al.  Early B Cell Progenitors Deficient for GON4L Fail To Differentiate Due to a Block in Mitotic Cell Division , 2017, The Journal of Immunology.

[24]  G. Qing,et al.  Cell cycle-dependent degradation of the methyltransferase SETD3 attenuates cell proliferation and liver tumorigenesis , 2017, The Journal of Biological Chemistry.

[25]  Ting Wang,et al.  The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions , 2017, Genome Biology.

[26]  Joachim Weischenfeldt,et al.  SvABA: genome-wide detection of structural variants and indels by local assembly , 2018, Genome research.

[27]  I. Petersen,et al.  Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking , 2016, Nature Genetics.

[28]  Pierre-Étienne Jacques,et al.  The International Human Epigenome Consortium Data Portal. , 2016, Cell systems.

[29]  Steven J. M. Jones,et al.  The International Human Epigenome Consortium: A Blueprint for Scientific Collaboration and Discovery , 2016, Cell.

[30]  Anthony D. Schmitt,et al.  A Compendium of Chromatin Contact Maps Reveals Spatially Active Regions in the Human Genome. , 2016, Cell reports.

[31]  Haley J. Abel,et al.  SVScore: an impact prediction tool for structural variation , 2016, bioRxiv.

[32]  R. Elble,et al.  Homeostatic Signaling by Cell–Cell Junctions and Its Dysregulation during Cancer Progression , 2016, Journal of clinical medicine.

[33]  M. Gerstein,et al.  Localized structural frustration for evaluating the impact of sequence variants , 2013, bioRxiv.

[34]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[35]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[36]  Hugo Y. K. Lam,et al.  Erratum: Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms , 2015, Nature Communications.

[37]  Hugo Y. K. Lam,et al.  Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms , 2015, Nature Communications.

[38]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[39]  Kevin Y. Yip,et al.  FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer , 2014, Genome Biology.

[40]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[41]  Y. Dou,et al.  The role of a newly identified SET domain-containing protein, SETD3, in oncogenesis , 2013, Haematologica.

[42]  Jan O. Korbel,et al.  Phenotypic impact of genomic structural variation: insights from and for human disease , 2013, Nature Reviews Genetics.

[43]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[44]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[45]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[46]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[47]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[48]  L. D. White,et al.  Genomic Hypomethylation in the Human Germline Associates with Selective Structural Mutability in the Human Genome , 2012, PLoS genetics.

[49]  Kee-Beom Kim,et al.  Histone Methyltransferase SETD3 Regulates Muscle Differentiation* , 2011, The Journal of Biological Chemistry.

[50]  Hannah Carter,et al.  CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer , 2011, Bioinform..

[51]  K. Helin,et al.  Histone methyltransferases in cancer. , 2010, Seminars in cell & developmental biology.

[52]  H. Tian,et al.  Cardiac-specific, inducible ClC-3 gene deletion eliminates native volume-sensitive chloride channels and produces myocardial hypertrophy in adult mice. , 2010, Journal of molecular and cellular cardiology.

[53]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[54]  A. Matsuki,et al.  Homozygous deletions and point mutations of the Rit1/Bcl11b gene in gamma-ray induced mouse thymic lymphomas. , 2003, Biochemical and biophysical research communications.

[55]  A. Harris,et al.  The ubiquitin-proteasome pathway in cancer. , 1998, British Journal of Cancer.

[56]  F. Milner,et al.  Disease Control , 2005, Fertility, Food and Fever.