Towards Recovering Allele-Specific Cancer Genome Graphs

Integrated analysis of structural variants (SVs) and copy number alterations (CNAs) in aneuploid cancer genomes is key to understanding the tumor genome complexity. A recently developed new algorithm Weaver can estimate, for the first time, allele-specific copy number of SVs and their interconnectivity in aneuploid cancer genomes. However, one major limitation is that not all SVs identified by Weaver are phased. In this paper, we develop a general convex programming framework that predicts the interconnectivity of unphased SVs with possibly noisy allele-specific copy number estimations as input. We demonstrated through applications to both simulated data and the HeLa whole-genome sequencing data that our method is robust to the noise in the input copy numbers and can predict SV phasings with high specificity. We found that our method can make consistent predictions with Weaver even if a large proportion of the input variants are unphased. We also applied our method to TCGA ovarian cancer whole-genome sequencing samples to phase unphased SVs obtained by Weaver. Our work provides an important new algorithmic framework for recovering more complete allele-specific cancer genome graphs.

[1]  David Haussler,et al.  Representing and decomposing genomic structural variants as balanced integer flows on sequence graphs , 2013, BMC Bioinformatics.

[2]  Paul Medvedev,et al.  Computational methods for discovering structural variation with next-generation sequencing , 2009, Nature Methods.

[3]  David Pellman,et al.  Causes and consequences of aneuploidy in cancer , 2012, Nature Reviews Genetics.

[4]  N. Carter,et al.  Estimation of rearrangement phylogeny for cancer genomes. , 2012, Genome research.

[5]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[6]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[7]  Benjamin J. Raphael,et al.  Reconstructing cancer genomes from paired-end sequencing data , 2012, BMC Bioinformatics.

[8]  Misko Dzamba,et al.  Detecting copy number variation with mated short reads. , 2010, Genome research.

[9]  A. McKenna,et al.  Absolute quantification of somatic DNA alterations in human cancer , 2012, Nature Biotechnology.

[10]  David Haussler,et al.  The infinite sites model of genome evolution , 2008, Proceedings of the National Academy of Sciences.

[11]  Jay Shendure,et al.  The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line , 2013, Nature.

[12]  Jian Ma,et al.  Allele-Specific Quantification of Structural Variations in Cancer Genomes , 2016, bioRxiv.

[13]  Hanlee P. Ji,et al.  Haplotyping germline and cancer genomes using high-throughput linked-read sequencing , 2015, Nature Biotechnology.

[14]  Deepayan Sarkar,et al.  Single-molecule analysis reveals widespread structural variation in multiple myeloma , 2015, Proceedings of the National Academy of Sciences.

[15]  Michael C. Rusch,et al.  CREST maps somatic structural variation in cancer genomes with base-pair resolution , 2011, Nature Methods.

[16]  Michael Brudno,et al.  Identification of complex genomic rearrangements in cancers using CouGaR , 2017, Genome research.

[17]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[18]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[19]  S. Gabriel,et al.  Pan-cancer patterns of somatic copy-number alteration , 2013, Nature Genetics.

[20]  M. Kimura The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. , 1969, Genetics.

[21]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[22]  C. Perou,et al.  Allele-specific copy number analysis of tumors , 2010, Proceedings of the National Academy of Sciences.