in the human genome Systematic prediction and validation of breakpoints associated with copy-number variants

Copy-number variants (CNVs) are an abundant form of genetic variation in humans. However, approaches for determining exact CNV breakpoint sequences (physical deletion or duplication boundaries) across individuals, crucial for associating genotype to phenotype, have been lacking so far, and the vast majority of CNVs have been reported with approximate genomic coordinates only. Here, we report an approach, called BreakPtr, for fine-mapping CNVs (available from http://breakptr.gersteinlab.org). We statistically integrate both sequence characteristics and data from high-resolution comparative genome hybridization experiments in a discrete-valued, bivariate hidden Markov model. Incorporation of nucleotide-sequence information allows us to take into account the fact that recently duplicated sequences (e.g., segmental duplications) often coincide with breakpoints. In anticipation of an upcoming increase in CNV data, we developed an iterative, “active” approach to initially scoring with a preliminary model, performing targeted validations, retraining the model, and then rescoring, and a flexible parameterization system that intuitively collapses from a full model of 2,503 parameters to a core one of only 10. Using our approach, we accurately mapped >400 breakpoints on chromosome 22 and a region of chromosome 11, refining the boundaries of many previously approximately mapped CNVs. Four predicted breakpoints flanked known disease-associated deletions. We validated an additional four predicted CNV breakpoints by sequencing. Overall, our results suggest a predictive resolution of ≈300bp. This level of resolution enables more precise correlations between CNVs and across individuals than previously possible, allowing the study of CNV population frequencies. Further, it enabled us to demonstrate a clear Mendelian pattern of inheritance for one of the CNVs.

[1]  C. Yau,et al.  QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data , 2007, Nucleic acids research.

[2]  J. Cavanaugh Biostatistics , 2005, Definitions.

[3]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[4]  S. P. Fodor,et al.  Large-Scale Transcriptional Activity in Chromosomes 21 and 22 , 2002, Science.

[5]  D. Conrad,et al.  A high-resolution survey of deletion polymorphism in the human genome , 2006, Nature Genetics.

[6]  E. Eichler,et al.  Segmental duplications and copy-number variation in the human genome. , 2005, American journal of human genetics.

[7]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[8]  Simon Smyth,et al.  Diabetes and obesity: the twin epidemics , 2006, Nature Medicine.

[9]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[10]  E. McCabe,et al.  Molecular genetic confirmatory testing from newborn screening samples for the common African‐American, Asian Indian, Southeast Asian, and Chinese β‐thalassemia mutations , 2005, American journal of hematology.

[11]  M. Adams,et al.  Recent Segmental Duplications in the Human Genome , 2002, Science.

[12]  Alexander Eckehart Urban,et al.  High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Mark Gerstein,et al.  Comparative analysis of genome tiling array data reveals many novel primate-specific functional RNAs in human , 2007, BMC Evolutionary Biology.

[14]  Pardis C Sabeti,et al.  Common deletion polymorphisms in the human genome , 2006, Nature Genetics.

[15]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[16]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[17]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[18]  Thomas E. Royce,et al.  Global Identification of Human Transcribed Sequences with Genome Tiling Arrays , 2004, Science.

[19]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[20]  Genes, chromosomes & cancer , 1995 .

[21]  Deborah A Nickerson,et al.  High-throughput genotyping of intermediate-size structural variation. , 2006, Human molecular genetics.

[22]  L. Feuk,et al.  Structural variation in the human genome , 2006, Nature Reviews Genetics.

[23]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[24]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[25]  R. Redon,et al.  Genome assembly comparison identifies structural variants in the human genome , 2006, Nature Genetics.

[26]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[27]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[28]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[29]  Enrico Petretto,et al.  Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans , 2006, Nature.

[30]  Proceedings of the IEEE , 2018, IEEE Journal of Emerging and Selected Topics in Power Electronics.

[31]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[32]  D. W. Scott On optimal and data based histograms , 1979 .

[33]  B. Rovin,et al.  The Influence of CCL 3 L 1 Gene – Containing Segmental Duplications on HIV-1 / AIDS Susceptibility , 2009 .

[34]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[35]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .

[36]  T. Richmond,et al.  Analysis of chromosome breakpoints in neuroblastoma at sub‐kilobase resolution using fine‐tiling oligonucleotide array CGH , 2005, Genes, chromosomes & cancer.

[37]  K. Frazer,et al.  Common deletions and SNPs are in linkage disequilibrium in the human genome , 2006, Nature Genetics.

[38]  Pawel Stankiewicz,et al.  Genomic Disorders: Molecular Mechanisms for Rearrangements and Conveyed Phenotypes , 2005, PLoS genetics.