Inferring the global structure of chromosomes from structural variations

BackgroundNext generation sequencing (NGS) technologies have made it possible to exhaustively detect structural variations (SVs) in genomes. Although various methods for detecting SVs have been developed, the global structure of chromosomes, i.e., how segments in a reference genome are extracted and ordered in an unknown target genome, cannot be inferred by detecting only individual SVs.ResultsHere, we formulate the problem of inferring the global structure of chromosomes from SVs as an optimization problem on a bidirected graph. This problem takes into account the aberrant adjacencies of genomic regions, the copy numbers, and the number and length of chromosomes. Although the problem is NP-complete, we propose its polynomial-time solvable variation by restricting instances of the problem using a biologically meaningful condition, which we call the weakly connected constraint. We also explain how to obtain experimental data that satisfies the weakly connected constraint.ConclusionOur results establish a theoretical foundation for the development of practical computational tools that could be used to infer the global structure of chromosomes based on SVs. The computational complexity of the inference can be reduced by detecting the segments of the reference genome at the ends of the chromosomes of the target genome and also the segments that are known to exist in the target genome.

[1]  Mathieu Blanchette,et al.  Ordering Partially Assembled Genomes Using Gene Arrangements , 2006, Comparative Genomics.

[2]  Rolf Niedermeier,et al.  A new view on Rural Postman based on Eulerian Extension and Matching , 2011, J. Discrete Algorithms.

[3]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[4]  Loretta Auvil,et al.  Reference-assisted chromosome assembly , 2013, Proceedings of the National Academy of Sciences.

[5]  Benjamin J. Raphael,et al.  Reconstructing cancer genomes from paired-end sequencing data , 2012, BMC Bioinformatics.

[6]  Shannon M. Bell,et al.  MIPHENO: data normalization for high throughput metabolite analysis , 2012, BMC Bioinformatics.

[7]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[8]  Süleyman Cenk Sahinalp,et al.  Combinatorial Algorithms for Structural Variation Detection in High Throughput Sequenced Genomes , 2009, RECOMB.

[9]  Ryan M. Layer,et al.  Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms , 2013, Genome research.

[10]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[11]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[12]  N. Carter,et al.  Massive Genomic Rearrangement Acquired in a Single Catastrophic Event during Cancer Development , 2011, Cell.

[13]  Paul Medvedev,et al.  Maximum Likelihood Genome Assembly , 2009, J. Comput. Biol..

[14]  Mihai Pop,et al.  Parametric Complexity of Sequence Assembly: Theory and Applications to Next Generation Sequencing , 2009, J. Comput. Biol..

[15]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[16]  Harold N. Gabow,et al.  An efficient reduction technique for degree-constrained subgraph and bidirected network flow problems , 1983, STOC.

[17]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[18]  Paul Medvedev,et al.  Computational methods for discovering structural variation with next-generation sequencing , 2009, Nature Methods.

[19]  M. Shen Chromoplexy: a new category of complex rearrangements in the cancer genome. , 2013, Cancer cell.

[20]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[21]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[22]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[23]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[24]  M. Ferguson-Smith,et al.  Afrotheria genome; overestimation of genome size and distinct chromosome GC content revealed by flow karyotyping. , 2013, Genomics.

[25]  Markus J. van Roosmalen,et al.  Constitutional chromothripsis rearrangements involve clustered double-stranded DNA breaks and nonhomologous repair mechanisms. , 2012, Cell reports.

[26]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[27]  Ali Bashir,et al.  Evaluation of Paired-End Sequencing Strategies for Detection of Genome Rearrangements in Cancer , 2008, PLoS Comput. Biol..

[28]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[29]  Ira M. Hall,et al.  Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. , 2010, Genome research.

[30]  David K. Smith Network Flows: Theory, Algorithms, and Applications , 1994 .

[31]  Jan O. Korbel,et al.  Phenotypic impact of genomic structural variation: insights from and for human disease , 2013, Nature Reviews Genetics.

[32]  Seyed Kamaledin Setarehdan,et al.  Centromere and Length Detection in Artificially Straightened Highly Curved Human Chromosomes , 2012 .

[33]  A. Sivachenko,et al.  Punctuated Evolution of Prostate Cancer Genomes , 2013, Cell.

[34]  Eugene W. Myers,et al.  The fragment assembly string graph , 2005, ECCB/JBI.

[35]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.