A Flexible Statistical Method for Detecting Genomic Copy-Number Changes Using Hidden Markov Models with Reversible Jump MCMC

We have developed a statistical method for the analysis of array-based CGH data to detect genomic DNA copy number changes. Our method allows us to answer the biologically relevant questions (what is the probability that a given gene or region has increased or decreased copy number changes) in a clear and simple way, within a rigorous statistical framework. We use a non-homogeneous Hidden Markov Model that incorporates distance between genes, a crucial requirement to analyze data from platforms where distances between probes is highly variable. As the true number of hidden states (states of copy number changes) is not known in advance in biological samples, we do not fix the number of hidden states of the model, but use Reversible Jump Markov Chain Monte Carlo for inference. We can therefore investigate the likely number of hidden states in the data and, more importantly, provide posterior probabilities that a gene or a set of genes is in a given state. To summarize results, we employ Bayesian Model Averaging, averaging over models with different states, and thus incorporating model uncertainty. Our method can be used to analyze data from each chromosome independently or all chromosomes together, offering both flexibility in the biological phenomena studied and increased statistical precision. Thus, our method provides a rigorous statistical foundation for locating genes and chromosomal regions with altered copy number and potentially related to cancer and other complex diseases.

[1]  D. Louis,et al.  A pseudolikelihood approach for simultaneous analysis of array comparative genomic hybridizations. , 2005, Biostatistics.

[2]  Alexander Eckehart Urban,et al.  High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Franck Picard,et al.  A statistical approach for array CGH data analysis , 2005, BMC Bioinformatics.

[4]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[5]  H. Ostrer,et al.  A versatile statistical analysis algorithm to detect genome copy number variation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  M. Stephens Dealing with label switching in mixture models , 2000 .

[7]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[8]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[9]  Céline Rouveirol,et al.  Bioinformatics Original Paper Computation of Recurrent Minimal Genomic Alterations from Array-cgh Data , 2022 .

[10]  D. Titterington,et al.  Bayesian inference in hidden Markov modelsthrough reversible jump Markov chain Monte , 2022 .

[11]  Wen-Lin Kuo,et al.  Array-based comparative genomic hybridization for genome-wide screening of DNA copy number in bladder tumors. , 2003, Cancer research.

[12]  L. Chin,et al.  High-resolution characterization of the pancreatic adenocarcinoma genome , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Yi Li,et al.  Bayesian Hidden Markov Modeling of Array CGH Data , 2008, Journal of the American Statistical Association.

[14]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[15]  Ingrid K. Glad,et al.  CGH-Explorer: a program for analysis of array-CGH data , 2005, Bioinform..

[16]  Emmanuel Barillot,et al.  Analysis of array CGH data: from signal ratio to gain and loss of DNA regions , 2004, Bioinform..

[17]  Kevin P. Murphy,et al.  Integrating copy number polymorphisms into array CGH analysis using a robust HMM , 2006, ISMB.

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  Tao Huang,et al.  Detection of DNA copy number alterations using penalized least squares regression , 2005, Bioinform..

[20]  J Khan,et al.  Detection of gene amplification by genomic hybridization to cDNA microarrays. , 2000, Cancer research.

[21]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[22]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[23]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[24]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .

[25]  Hans A. Kestler,et al.  Genomic DNA-Chip Hybridization Reveals a Higher Incidence of Genomic Amplifications in Pancreatic Cancer than Conventional Comparative Genomic Hybridization and Leads to the Identification of Novel Candidate Genes , 2004, Cancer Research.

[26]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[27]  C. Robert,et al.  Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method , 2000 .

[28]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[29]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[30]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[31]  Simon Tavaré,et al.  BioHMM: a heterogeneous hidden Markov model for segmenting array CGH data , 2006, Bioinform..

[32]  G. Roberts,et al.  Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions , 2003 .

[33]  Douglas Grove,et al.  Denoising array-based comparative genomic hybridization data using wavelets. , 2005, Biostatistics.

[34]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[35]  Sylvia Richardson,et al.  Detection of gene copy number changes in CGH microarrays using a spatially correlated mixture model , 2006, Bioinform..

[36]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[37]  Raj Chari,et al.  Recent advances in array comparative genomic hybridization technologies and their applications in human genetics , 2006, European Journal of Human Genetics.

[38]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[39]  Christian J Stoeckert,et al.  STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. , 2006, Genome research.

[40]  Eric Moulines,et al.  Inference in Hidden Markov Models (Springer Series in Statistics) , 2005 .

[41]  Y. Chen,et al.  Comparative genomic hybridization analysis of 38 breast cancer cell lines: a basis for interpreting complementary DNA microarray data. , 2000, Cancer research.

[42]  Åsa Hedman,et al.  SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data , 2005, Nucleic acids research.

[43]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[44]  Ajay N. Jain,et al.  Assembly of microarrays for genome-wide measurement of DNA copy number , 2001, Nature Genetics.

[45]  Peter Guttorp,et al.  A Nonhomogeneous Hidden Markov Model for Precipitation , 1996 .

[46]  Ivan Smirnov,et al.  Array Comparative Genomic Hybridization Identifies Genetic Subgroups in Grade 4 Human Astrocytoma , 2005, Clinical Cancer Research.