Detection of recurrent copy number alterations in the genome: taking among-subject heterogeneity seriously

BackgroundAlterations in the number of copies of genomic DNA that are common or recurrent among diseased individuals are likely to contain disease-critical genes. Unfortunately, defining common or recurrent copy number alteration (CNA) regions remains a challenge. Moreover, the heterogeneous nature of many diseases requires that we search for common or recurrent CNA regions that affect only some subsets of the samples (without knowledge of the regions and subsets affected), but this is neglected by most methods.ResultsWe have developed two methods to define recurrent CNA regions from aCGH data. Our methods are unique and qualitatively different from existing approaches: they detect regions over both the complete set of arrays and alterations that are common only to some subsets of the samples (i.e., alterations that might characterize previously unknown groups); they use probabilities of alteration as input and return probabilities of being a common region, thus allowing researchers to modify thresholds as needed; the two parameters of the methods have an immediate, straightforward, biological interpretation. Using data from previous studies, we show that we can detect patterns that other methods miss and that researchers can modify, as needed, thresholds of immediate interpretability and develop custom statistics to answer specific research questions.ConclusionThese methods represent a qualitative advance in the location of recurrent CNA regions, highlight the relevance of population heterogeneity for definitions of recurrence, and can facilitate the clustering of samples with respect to patterns of CNA. Ultimately, the methods developed can become important tools in the search for genomic regions harboring disease-critical genes.

[1]  Hui Ye,et al.  A forward-backward fragment assembling algorithm for the identification of genomic amplification and deletion breakpoints using high-density single nucleotide polymorphism (SNP) array , 2007, BMC Bioinformatics.

[2]  Sanjay Ranka,et al.  Markers improve clustering of CGH data , 2007, Bioinform..

[3]  E. Birney,et al.  Challenges and standards in integrating surveys of structural variation , 2007, Nature Genetics.

[4]  S. Tavaré,et al.  High-resolution aCGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer , 2007, Genome Biology.

[5]  Xavier Estivill,et al.  Accounting for uncertainty when assessing association between copy number and disease: a latent class model , 2009, BMC Bioinformatics.

[6]  Seunghak Lee,et al.  A robust framework for detecting structural variations in a genome , 2008, ISMB.

[7]  L. Chin,et al.  High-resolution characterization of the pancreatic adenocarcinoma genome , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Fred A. Wright,et al.  Integrated study of copy number states and genotype calls using high-density SNP arrays , 2009, Nucleic acids research.

[9]  M. A. van de Wiel,et al.  Weighted clustering of called array CGH data. , 2008, Biostatistics.

[10]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[11]  J. Lupski,et al.  Genomic rearrangements and sporadic disease , 2007, Nature Genetics.

[12]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[13]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[14]  N. Carter,et al.  Array Comparative Genomic Hybridization Analysis of Colorectal Cancer Cell Lines and Primary Carcinomas , 2004, Cancer Research.

[15]  John N. Weinstein,et al.  Framework for Identifying Common Aberrations in DNA Copy Number Data , 2007, RECOMB.

[16]  Jing Huang,et al.  CARAT: A novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays , 2006, BMC Bioinformatics.

[17]  Iuliana Ionita-Laza,et al.  On the analysis of copy‐number variations in genome‐wide association studies: a translation of the family‐based association test , 2008, Genetic epidemiology.

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[20]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .

[21]  Charles Lee,et al.  Copy number variations and clinical cytogenetic diagnosis of constitutional disorders , 2007, Nature Genetics.

[22]  Yonatan Aumann,et al.  Efficient Calculation of Interval Scores for DNA Copy Number Data Analysis , 2005, RECOMB.

[23]  M. A. van de Wiel,et al.  CGHregions: Dimension Reduction for Array CGH Data with Minimal Information Loss , 2007, Cancer informatics.

[24]  T. LaFramboise,et al.  Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances , 2009, Nucleic acids research.

[25]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[26]  Céline Rouveirol,et al.  Bioinformatics Original Paper Computation of Recurrent Minimal Genomic Alterations from Array-cgh Data , 2022 .

[27]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[28]  E. Lander,et al.  Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma , 2007, Proceedings of the National Academy of Sciences.

[29]  Céline Rouveirol,et al.  VAMP: Visualization and analysis of array-CGH, transcriptome and other molecular profiles , 2006, Bioinform..

[30]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[31]  Jeff A. Bilmes,et al.  What HMMs Can Do , 2006, IEICE Trans. Inf. Syst..

[32]  X. Estivill,et al.  Copy number variants and genetic traits: closer to the resolution of phenotypic to genotypic variability , 2007, Nature Reviews Genetics.

[33]  Eric Moulines,et al.  Inference in hidden Markov models , 2010, Springer series in statistics.

[34]  Derek Y. Chiang,et al.  Characterizing the cancer genome in lung adenocarcinoma , 2007, Nature.

[35]  Ramón Díaz-Uriarte,et al.  A response to Yu et al. "A forward-backward fragment assembling algorithm for the identification of genomic amplification and deletion breakpoints using high-density single nucleotide polymorphism (SNP) array", BMC Bioinformatics 2007, 8: 145 , 2007, BMC Bioinformatics.

[36]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[37]  Kevin P. Murphy,et al.  Modeling recurrent DNA copy number alterations in array CGH data , 2007, ISMB/ECCB.

[38]  Christian J Stoeckert,et al.  STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. , 2006, Genome research.

[39]  B. Ylstra,et al.  BAC to the future! or oligonucleotides: a perspective for micro array comparative genomic hybridization (array CGH) , 2006, Nucleic acids research.

[40]  S. Dhanasekaran,et al.  Integrative analysis of genomic aberrations associated with prostate cancer progression. , 2007, Cancer research.

[41]  Jeroen de Ridder,et al.  Identification of cancer genes using a statistical framework for multiexperiment analysis of nondiscretized array CGH data , 2008, Nucleic acids research.

[42]  Jonathan Sebat,et al.  Major changes in our DNA lead to major changes in our thinking , 2007, Nature Genetics.

[43]  C. Sander,et al.  Functional Copy-Number Alterations in Cancer , 2008, PloS one.

[44]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Louise V Wain,et al.  Genomic copy number variation, human health, and disease , 2009, The Lancet.

[46]  Wessel N. van Wieringen,et al.  CGHregions: Dimension Reduction for Array CGH Data with Minimal Information Loss , 2007 .

[47]  Oscar M. Rueda and Ramon Diaz-Uriarte Finding Recurrent Copy Number Alteration Regions: A Review of Methods , 2010 .

[48]  Tomas W. Fitzgerald,et al.  A robust statistical method for case-control association testing with copy number variation , 2008, Nature Genetics.

[49]  Eric Moulines,et al.  Inference in Hidden Markov Models (Springer Series in Statistics) , 2005 .

[50]  Christian J Stoeckert,et al.  Assessing the Significance of Conserved Genomic Aberrations Using High Resolution Genomic Microarrays , 2007, PLoS genetics.

[51]  T. Golub,et al.  Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma , 2005, Nature.

[52]  K. Frazer,et al.  Human genetic variation and its contribution to complex traits , 2009, Nature Reviews Genetics.

[53]  N. Carter Methods and strategies for analyzing copy number variation using DNA microarrays , 2007, Nature Genetics.

[54]  Chao Xie,et al.  CNV-seq, a new method to detect copy number variation using high-throughput sequencing , 2009, BMC Bioinformatics.

[55]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[56]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[57]  Ramón Díaz-Uriarte,et al.  Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH , 2007, PLoS Comput. Biol..

[58]  Sanjay Ranka,et al.  Gene expression Distance-based clustering of CGH data , 2006 .

[59]  L. Chin,et al.  High-resolution genomic profiles of human lung cancer. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[60]  Ivan Smirnov,et al.  Array Comparative Genomic Hybridization Identifies Genetic Subgroups in Grade 4 Human Astrocytoma , 2005, Clinical Cancer Research.

[61]  Y. Wang,et al.  High-resolution array-CGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer , 2007 .

[62]  A. Sparks,et al.  The Genomic Landscapes of Human Breast and Colorectal Cancers , 2007, Science.

[63]  S. Shah,et al.  Computational methods for identification of recurrent copy number alteration patterns by array CGH , 2009, Cytogenetic and Genome Research.

[64]  Jane Fridlyand,et al.  High-resolution analysis of DNA copy number alterations in colorectal cancer by array-based comparative genomic hybridization. , 2004, Carcinogenesis.

[65]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .

[66]  S. Mccarroll,et al.  Copy-number variation and association studies of human disease , 2007, Nature Genetics.

[67]  K. Kinzler,et al.  Genetic instabilities in human cancers , 1998, Nature.