Detection of recurrent rearrangement breakpoints from copy number data

BackgroundCopy number variants (CNVs), including deletions, amplifications, and other rearrangements, are common in human and cancer genomes. Copy number data from array comparative genome hybridization (aCGH) and next-generation DNA sequencing is widely used to measure copy number variants. Comparison of copy number data from multiple individuals reveals recurrent variants. Typically, the interior of a recurrent CNV is examined for genes or other loci associated with a phenotype. However, in some cases, such as gene truncations and fusion genes, the target of variant lies at the boundary of the variant.ResultsWe introduce Neighborhood Breakpoint Conservation (NBC), an algorithm for identifying rearrangement breakpoints that are highly conserved at the same locus in multiple individuals. NBC detects recurrent breakpoints at varying levels of resolution, including breakpoints whose location is exactly conserved and breakpoints whose location varies within a gene. NBC also identifies pairs of recurrent breakpoints such as those that result from fusion genes. We apply NBC to aCGH data from 36 primary prostate tumors and identify 12 novel rearrangements, one of which is the well-known TMPRSS2-ERG fusion gene. We also apply NBC to 227 glioblastoma tumors and predict 93 novel rearrangements which we further classify as gene truncations, germline structural variants, and fusion genes. A number of these variants involve the protein phosphatase PTPN12 suggesting that deregulation of PTPN12, via a variety of rearrangements, is common in glioblastoma.ConclusionsWe demonstrate that NBC is useful for detection of recurrent breakpoints resulting from copy number variants or other structural variants, and in particular identifies recurrent breakpoints that result in gene truncations or fusion genes. Software is available at http://http.//cs.brown.edu/people/braphael/software.html.

[1]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[2]  John N. Weinstein,et al.  Framework for Identifying Common Aberrations in DNA Copy Number Data , 2007, RECOMB.

[3]  Jane Fridlyand,et al.  Whole genome scanning identifies genotypes associated with recurrence and metastasis in prostate tumors. , 2005, Human molecular genetics.

[4]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[5]  J. Sebat,et al.  Representational oligonucleotide microarray analysis: a high-resolution method to detect genome copy number variation. , 2003, Genome research.

[6]  D. St. Clair,et al.  Copy number variation and schizophrenia. , 2009, Schizophrenia bulletin.

[7]  Yi Li,et al.  Bayesian Hidden Markov Modeling of Array CGH Data , 2008, Journal of the American Statistical Association.

[8]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[9]  A. Waha,et al.  Runx2 is expressed in human glioma cells and mediates the expression of galectin‐3 , 2008, Journal of neuroscience research.

[10]  P Stanier,et al.  Genomic imprinting of PPP1R9A encoding neurabin I in skeletal muscle and extra-embryonic tissues , 2004, Journal of Medical Genetics.

[11]  Soyeon Park,et al.  A chromosomal region 7p11.2 transcript map: its development and application to the study of EGFR amplicons in glioblastoma. , 2002, Neuro-oncology.

[12]  K. Choy,et al.  The impact of human copy number variation on a new era of genetic testing , 2010, BJOG : an international journal of obstetrics and gynaecology.

[13]  Ronald C. Petersen,et al.  Genetic variation in PCDH11X is associated with susceptibility to late-onset Alzheimer's disease , 2009, Alzheimer's & Dementia.

[14]  Robert J Glynn,et al.  Folic acid, pyridoxine, and cyanocobalamin combination treatment and age-related macular degeneration in women: the Women's Antioxidant and Folic Acid Cardiovascular Study. , 2009, Archives of Internal Medicine.

[15]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[16]  Robert Kincaid,et al.  Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[18]  Franck Picard,et al.  A statistical approach for array CGH data analysis , 2005, BMC Bioinformatics.

[19]  Tushar Patel,et al.  Involvement of human micro-RNA in growth and response to chemotherapy in human cholangiocarcinoma cell lines. , 2006, Gastroenterology.

[20]  E. Lander,et al.  Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma , 2007, Proceedings of the National Academy of Sciences.

[21]  H. N. Nagaraja,et al.  Order Statistics, Third Edition , 2005, Wiley Series in Probability and Statistics.

[22]  David O Siegmund,et al.  A Modified Bayes Information Criterion with Applications to the Analysis of Comparative Genomic Hybridization Data , 2007, Biometrics.

[23]  Christian J Stoeckert,et al.  STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. , 2006, Genome research.

[24]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[25]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[26]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[27]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[28]  C. Moskaluk,et al.  Combined genomic and gene expression microarray profiling identifies ECOP as an upregulated gene in squamous cell carcinomas independent of DNA amplification , 2009, Oncogene.

[29]  Ken Chen,et al.  CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data , 2010, Bioinform..

[30]  William Stafford Noble,et al.  Automated mapping of large-scale chromatin structure in ENCODE , 2008, Bioinform..

[31]  J. Hartigan,et al.  A Bayesian Analysis for Change Point Problems , 1993 .

[32]  Michael Hiller,et al.  Genetic Variants of the Copy Number Polymorphic β-Defensin Locus Are Associated with Sporadic Prostate Cancer , 2008, Tumor Biology.

[33]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[34]  Jun S. Liu,et al.  Bayesian inference on biopolymer models , 1999, Bioinform..

[35]  Chandra Erdman,et al.  A fast Bayesian change point analysis for the segmentation of microarray data , 2008, Bioinform..

[36]  R. H. Myers,et al.  STAT 319 : Probability & Statistics for Engineers & Scientists Term 152 ( 1 ) Final Exam Wednesday 11 / 05 / 2016 8 : 00 – 10 : 30 AM , 2016 .

[37]  References , 1971 .

[38]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[39]  Derek Y. Chiang,et al.  High-resolution mapping of copy-number alterations with massively parallel sequencing , 2009, Nature Methods.

[40]  Gary D Bader,et al.  Functional impact of global rare copy number variation in autism spectrum disorders , 2010, Nature.