A Statistical Method for Identifying Trait-Associated Copy Number Variants

Copy number variants (CNVs), ranging in size from about one kilobase to several megabases, are DNA alterations of a genome that result in the cell having less or more than two copies of segments of the DNA. Such CNVs have been shown to be associated with many complex phenotypes, ranging from diseases to gene expressions. Novel methods have been developed for identifying CNVs both at the individual and at the population level. However, methods for testing CNV association are limited. Most available methods employ a two-step approach, where CNVs carried by the samples are identified first and then tested for association. However, the results of such tests depend on the threshold used for CNV identification and also the number of CNVs to be tested. We developed a method, CNVtest, to directly identify the trait-associated CNVs without the need of identifying sample-specific CNVs. We show that CNVtest asymptotically controls the type I error rate and identifies true trait-associated CNVs with a high probability. We demonstrate the methods using simulations and an application to identify the CNVs that are associated with population differentiation.

[1]  Hongzhe Li,et al.  Simultaneous Discovery of Rare and Common Segment Variants. , 2013, Biometrika.

[2]  Nancy R. Zhang,et al.  Detecting simultaneous changepoints in multiple sequences. , 2010, Biometrika.

[3]  P. Visscher,et al.  Rare chromosomal deletions and duplications increase risk of schizophrenia , 2008, Nature.

[4]  D. Siegmund Detecting Simultaneous Change-points in Multiple Sequences , 2008 .

[5]  Sharon J. Diskin,et al.  Copy number variation at 1q21.1 associated with neuroblastoma , 2009, Nature.

[6]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[7]  Thomas W. Mühleisen,et al.  Large recurrent microdeletions associated with schizophrenia , 2008, Nature.

[8]  Yong-shu He,et al.  [Structural variation in the human genome]. , 2009, Yi chuan = Hereditas.

[9]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[10]  A. Singleton,et al.  Rare Structural Variants Disrupt Multiple Genes in Neurodevelopmental Pathways in Schizophrenia , 2008, Science.

[11]  Tomas W. Fitzgerald,et al.  A robust statistical method for case-control association testing with copy number variation , 2008, Nature Genetics.

[12]  Nancy R. Zhang,et al.  Detecting simultaneous variant intervals in aligned sequences , 2011, 1108.3177.

[13]  Hongzhe Li,et al.  Robust detection and identification of sparse segments in ultrahigh dimensional data analysis , 2012, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[14]  Jianxin Shi,et al.  VTET: a variable threshold exact test for identifying disease-associated copy number variations enriched in short genomic regions , 2014, Front. Genet..

[15]  Hongzhe Li,et al.  Optimal Sparse Segment Identification With Application in Copy Number Variation Analysis , 2010, Journal of the American Statistical Association.

[16]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[17]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.