Representing and decomposing genomic structural variants as balanced integer flows on sequence graphs

BackgroundThe study of genomic variation has provided key insights into the functional role of mutations. Predominantly, studies have focused on single nucleotide variants (SNV), which are relatively easy to detect and can be described with rich mathematical models. However, it has been observed that genomes are highly plastic, and that whole regions can be moved, removed or duplicated in bulk. These structural variants (SV) have been shown to have significant impact on phenotype, but their study has been held back by the combinatorial complexity of the underlying models.ResultsWe describe here a general model of structural variation that encompasses both balanced rearrangements and arbitrary copy-number variants (CNV).ConclusionsIn this model, we show that the space of possible evolutionary histories that explain the structural differences between any two genomes can be sampled ergodically.

[1]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[2]  M. Hurles,et al.  Copy number variation in human health, disease, and evolution. , 2009, Annual review of genomics and human genetics.

[3]  Richard Friedberg,et al.  Efficient sorting of genomic permutations by translocation, inversion and block interchange , 2005, Bioinform..

[4]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[5]  klaguia International Network of Cancer Genome Projects , 2010 .

[6]  A. Kotzig Moves Without Forbidden Transitions in a Graph , 1968 .

[7]  David Sankoff,et al.  Reconstructing the pre-doubling genome , 1999, RECOMB.

[8]  Xin Chen,et al.  Assignment of orthologous genes via genome rearrangement , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  David Haussler,et al.  A Unifying Parsimony Model of Genome Evolution , 2013 .

[10]  J. Tchinda,et al.  Recurrent Fusion of TMPRSS2 and ETS Transcription Factor Genes in Prostate Cancer , 2005, Science.

[11]  S. Jeffery Evolution of Protein Molecules , 1979 .

[12]  A. Shlien,et al.  Copy number variations and cancer , 2009, Genome Medicine.

[13]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[14]  Yu Lin,et al.  Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion , 2012, BMC Bioinformatics.

[15]  Vineet Bafna,et al.  Genome rearrangements and sorting by reversals , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[16]  David Sankoff,et al.  The Reconstruction of Doubled Genomes , 2003, SIAM J. Comput..

[17]  Pavel A. Pevzner,et al.  DNA physical mapping and alternating Eulerian cycles in colored graphs , 1995, Algorithmica.

[18]  Bernard M. E. Moret,et al.  Comparing genomes with rearrangements and segmental duplications , 2015, Bioinform..

[19]  El-Mabrouk,et al.  On the Reconstruction of Ancient Doubled Circular Genomes Using Minimum Reversals. , 1999, Genome informatics. Workshop on Genome Informatics.

[20]  David Haussler,et al.  A unifying model of genome evolution under parsimony , 2013, BMC Bioinformatics.

[21]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[22]  Martin Bader,et al.  Genome rearrangements with duplications , 2010, BMC Bioinformatics.

[23]  F Harary,et al.  On the Number of Husimi Trees: I. , 1953, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Jens Stoye,et al.  A Unifying View of Genome Rearrangements , 2006, WABI.

[25]  N. Carter,et al.  Massive Genomic Rearrangement Acquired in a Single Catastrophic Event during Cancer Development , 2011, Cell.

[26]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[27]  Ron Shamir,et al.  Sorting Cancer Karyotypes by Elementary Operations , 2008, RECOMB-CG.

[28]  Jens Stoye,et al.  Double Cut and Join with Insertions and Deletions , 2011, J. Comput. Biol..

[29]  Alberto Caprara,et al.  Sorting Permutations by Reversals and Eulerian Cycle Decompositions , 1999, SIAM J. Discret. Math..

[30]  Yu Lin,et al.  Sorting genomes with rearrangements and segmental duplications through trajectory graphs , 2013, BMC Bioinformatics.

[31]  João Meidanis,et al.  SCJ: A Breakpoint-Like Distance that Simplifies Several Rearrangement Problems , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  David Sankoff,et al.  Genome Halving , 1998, CPM.

[33]  Saunders Mac Lane,et al.  A combinatorial condition for planar graphs , 1937 .

[34]  Ron Shamir,et al.  Sorting by Cuts, Joins, and Whole Chromosome Duplications , 2017, J. Comput. Biol..

[35]  Martin Bader,et al.  Sorting by reversals, block interchanges, tandem duplications, and deletions , 2009, BMC Bioinformatics.

[36]  Rick Durrett,et al.  Genome Rearrangement : Recent Progress and Open Problems , 2005 .

[37]  P. Pevzner,et al.  Colored de Bruijn Graphs and the Genome Halving Problem , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  Marc Via i García An integrated map of genetic variation from 1,092 human genomes , 2012 .

[39]  Richard Friedberg,et al.  DCJ Path Formulation for Genome Transformations which Include Insertions, Deletions, and Duplications , 2009, J. Comput. Biol..