A geometric approach for classification and comparison of structural variants

Motivation: Structural variants, including duplications, insertions, deletions and inversions of large blocks of DNA sequence, are an important contributor to human genome variation. Measuring structural variants in a genome sequence is typically more challenging than measuring single nucleotide changes. Current approaches for structural variant identification, including paired-end DNA sequencing/mapping and array comparative genomic hybridization (aCGH), do not identify the boundaries of variants precisely. Consequently, most reported human structural variants are poorly defined and not readily compared across different studies and measurement techniques. Results: We introduce Geometric Analysis of Structural Variants (GASV), a geometric approach for identification, classification and comparison of structural variants. This approach represents the uncertainty in measurement of a structural variant as a polygon in the plane, and identifies measurements supporting the same variant by computing intersections of polygons. We derive a computational geometry algorithm to efficiently identify all such intersections. We apply GASV to sequencing data from nine individual human genomes and several cancer genomes. We obtain better localization of the boundaries of structural variants, distinguish genetic from putative somatic structural variants in cancer genomes, and integrate aCGH and paired-end sequencing measurements of structural variants. This work presents the first general framework for comparing structural variants across multiple samples and measurement techniques, and will be useful for studies of both genetic structural variants and somatic rearrangements in cancer. Availability: http://cs.brown.edu/people/braphael/software.html Contact: braphael@brown.edu

[1]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[2]  K. Chin,et al.  End-sequence profiling: Sequence-based analysis of aberrant genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[4]  Martin Strauch,et al.  Reconstructing Tumor Genome Architectures , 2022 .

[5]  Sun-Yuan Kung,et al.  Accurate detection of aneuploidies in array CGH and gene expression microarray data , 2004, Bioinform..

[6]  Justin O. Borevitz,et al.  Natural Selection Shapes Genome-Wide Patterns of Copy-Number Polymorphism in Drosophila melanogaster , 2008, Science.

[7]  Michael Ian Shamos,et al.  Geometric intersection problems , 1976, 17th Annual Symposium on Foundations of Computer Science (sfcs 1976).

[8]  S. Wooding,et al.  Following the herd , 2007, Nature Genetics.

[9]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[10]  Franco P. Preparata,et al.  Plane-sweep algorithms for intersecting geometric figures , 1982, CACM.

[11]  E. Eichler,et al.  Systematic assessment of copy number variant detection via genome-wide SNP genotyping , 2008, Nature Genetics.

[12]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[13]  Seunghak Lee,et al.  A robust framework for detecting structural variations in a genome , 2008, ISMB.

[14]  Yong-shu He,et al.  [Structural variation in the human genome]. , 2009, Yi chuan = Hereditas.

[15]  A. Tsalenko,et al.  The fine-scale and complex architecture of human copy-number variation. , 2008, American journal of human genetics.

[16]  Benjamin J. Raphael,et al.  A sequence-based survey of the complex structural organization of tumor genomes , 2008, Genome Biology.

[17]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[18]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[19]  D. Conrad,et al.  A high-resolution survey of deletion polymorphism in the human genome , 2006, Nature Genetics.

[20]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[21]  E. Birney,et al.  Challenges and standards in integrating surveys of structural variation , 2007, Nature Genetics.

[22]  P. Stankiewicz,et al.  Genome architecture, rearrangements and genomic disorders. , 2002, Trends in genetics : TIG.

[23]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[24]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[25]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[26]  D. Pinto,et al.  Structural variation of chromosomes in autism spectrum disorder. , 2008, American journal of human genetics.

[27]  D. Hartl,et al.  A portrait of copy-number polymorphism in Drosophila melanogaster , 2007, Proceedings of the National Academy of Sciences.

[28]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[29]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[30]  Ali Bashir,et al.  Evaluation of Paired-End Sequencing Strategies for Detection of Genome Rearrangements in Cancer , 2008, PLoS Comput. Biol..

[31]  Benjamin J. Raphael,et al.  Decoding the fine-scale structure of a breast cancer genome and transcriptome. , 2006, Genome research.

[32]  Ira M. Hall,et al.  Recurrent DNA copy number variation in the laboratory mouse , 2007, Nature Genetics.

[33]  Benjamin J. Raphael,et al.  Reconstructing tumor genome architectures , 2003, ECCB.

[34]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[35]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[36]  Simon Smyth,et al.  Diabetes and obesity: the twin epidemics , 2006, Nature Medicine.

[37]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.