Accounting for uncertainty when assessing association between copy number and disease: a latent class model

BackgroundCopy number variations (CNVs) may play an important role in disease risk by altering dosage of genes and other regulatory elements, which may have functional and, ultimately, phenotypic consequences. Therefore, determining whether a CNV is associated or not with a given disease might be relevant in understanding the genesis and progression of human diseases. Current stage technology give CNV probe signal from which copy number status is inferred. Incorporating uncertainty of CNV calling in the statistical analysis is therefore a highly important aspect. In this paper, we present a framework for assessing association between CNVs and disease in case-control studies where uncertainty is taken into account. We also indicate how to use the model to analyze continuous traits and adjust for confounding covariates.ResultsThrough simulation studies, we show that our method outperforms other simple methods based on inferring the underlying CNV and assessing association using regular tests that do not propagate call uncertainty. We apply the method to a real data set in a controlled MLPA experiment showing good results. The methodology is also extended to illustrate how to analyze aCGH data.ConclusionWe demonstrate that our method is robust and achieves maximal theoretical power since it accommodates uncertainty when copy number status are inferred. We have made R functions freely available.

[1]  Wessel N. van Wieringen,et al.  CGHcall: Calling aberrations for array CGH tumor profiles. , 2008 .

[2]  E. Eichler,et al.  Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. , 2006, American journal of human genetics.

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  J. Daudin,et al.  A Segmentation/Clustering Model for the Analysis of Array CGH Data , 2007, Biometrics.

[5]  S. Sarkar False discovery and false nondiscovery rates in single-step multiple testing procedures , 2006, math/0605607.

[6]  D. Zwijnenburg,et al.  Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification. , 2002, Nucleic acids research.

[7]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[8]  Roger Logan,et al.  Estimation and Inference for Logistic Regression with Covariate Misclassification and Measurement Error in Main Study/Validation Study Designs , 2000 .

[9]  Enrico Petretto,et al.  Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans , 2006, Nature.

[10]  Ori Davidov,et al.  Misclassification in Logistic Regression with Discrete Covariates , 2003 .

[11]  Xavier Estivill,et al.  Probe-specific mixed-model approach to detect copy number differences using multiplex ligation-dependent probe amplification (MLPA) , 2008, BMC Bioinformatics.

[12]  M. A. Wiel,et al.  Nonparametric testing for DNA copy number induced differential mRNA gene expression. , 2009 .

[13]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[14]  Wen-Lin Kuo,et al.  A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. , 2006, Cancer cell.

[15]  S Greenland,et al.  Basic methods for sensitivity analysis of biases. , 1996, International journal of epidemiology.

[16]  R. Redon,et al.  Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes , 2007, Science.

[17]  B. Rovin,et al.  The Influence of CCL 3 L 1 Gene – Containing Segmental Duplications on HIV-1 / AIDS Susceptibility , 2009 .

[18]  J. Lupski Structural variation in the human genome. , 2007, The New England journal of medicine.

[19]  F. Leisch FlexMix: A general framework for finite mixture models and latent class regression in R , 2004 .

[20]  Frédéric Morel,et al.  Hereditary pancreatitis caused by triplication of the trypsinogen locus , 2006, Nature Genetics.

[21]  Wessel N van Wieringen,et al.  Nonparametric Testing for DNA Copy Number Induced Differential mRNA Gene Expression , 2009, Biometrics.

[22]  Bernhard Radlwimmer,et al.  A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn disease of the colon. , 2006, American journal of human genetics.

[23]  Wessel N. van Wieringen,et al.  CGHregions: Dimension Reduction for Array CGH Data with Minimal Information Loss , 2007 .

[24]  Dieter Niederacher,et al.  MLPA screening in the BRCA1 gene from 1,506 German hereditary breast cancer cases: novel deletions, frequent involvement of exon 17, and occurrence in single early‐onset cases , 2008, Human mutation.

[25]  Carolyn J. Brown,et al.  A comprehensive analysis of common copy-number variations in the human genome. , 2007, American journal of human genetics.

[26]  Iuliana Ionita-Laza,et al.  Genetic association analysis of copy-number variation (CNV) in human disease pathogenesis. , 2009, Genomics.

[27]  Anders Albrechtsen,et al.  Large BRCA1 and BRCA2 genomic rearrangements in Danish high risk breast-ovarian cancer families , 2009, Breast Cancer Research and Treatment.

[28]  S. Duffy,et al.  The correction of risk estimates for measurement error. , 1997, Annals of epidemiology.

[29]  M. A. van de Wiel,et al.  CGHregions: Dimension Reduction for Array CGH Data with Minimal Information Loss , 2007, Cancer informatics.

[30]  Juan Du,et al.  COMBINED ALGORITHMS FOR CONSTRAINED ESTIMATION OF FINITE MIXTURE DISTRIBUTIONS WITH GROUPED DATA AND CONDITIONAL DATA , 2002 .

[31]  D. Campion,et al.  APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy , 2006, Nature Genetics.