Assessment of copy number variation using the Illumina Infinium 1M SNP‐array: a comparison of methodological approaches in the Spanish Bladder Cancer/EPICURO study

High‐throughput single nucleotide polymorphism (SNP)‐array technologies allow to investigate copy number variants (CNVs) in genome‐wide scans and specific calling algorithms have been developed to determine CNV location and copy number. We report the results of a reliability analysis comparing data from 96 pairs of samples processed with CNVpartition, PennCNV, and QuantiSNP for Infinium Illumina Human 1Million probe chip data. We also performed a validity assessment with multiplex ligation‐dependent probe amplification (MLPA) as a reference standard. The number of CNVs per individual varied according to the calling algorithm. Higher numbers of CNVs were detected in saliva than in blood DNA samples regardless of the algorithm used. All algorithms presented low agreement with mean Kappa Index (KI) <66. PennCNV was the most reliable algorithm (KIw=98.96) when assessing the number of copies. The agreement observed in detecting CNV was higher in blood than in saliva samples. When comparing to MLPA, all algorithms identified poorly known copy aberrations (sensitivity = 0.19–0.28). In contrast, specificity was very high (0.97–0.99). Once a CNV was detected, the number of copies was truly assessed (sensitivity >0.62). Our results indicate that the current calling algorithms should be improved for high performance CNV analysis in genome‐wide scans. Further refinement is required to assess CNVs as risk factors in complex diseases.Hum Mutat 32:1–10, 2011. © 2011 Wiley‐Liss, Inc.

[1]  M. Kogevinas,et al.  NAT 2 slow acetylation , GSTM 1 null genotype , and risk of bladder cancer : results from the Spanish Bladder Cancer Study and meta-analyses , 2005 .

[2]  S. Mccarroll,et al.  Copy-number variation and association studies of human disease , 2007, Nature Genetics.

[3]  Joseph A. Gogos,et al.  Strong association of de novo copy number mutations with sporadic schizophrenia , 2008, Nature Genetics.

[4]  Richard M Myers,et al.  Population analysis of large copy number variants and hotspots of human genetic disease. , 2009, American journal of human genetics.

[5]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[6]  Sarah Barber,et al.  Oligonucleotide microarray analysis of genomic imbalance in children with mental retardation. , 2006, American journal of human genetics.

[7]  Sonja W. Scholz,et al.  Genomewide SNP assay reveals mutations underlying Parkinson disease , 2008, Human mutation.

[8]  Cheng Li,et al.  dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data , 2004, Bioinform..

[9]  Iuliana Ionita-Laza,et al.  On the analysis of copy‐number variations in genome‐wide association studies: a translation of the family‐based association test , 2008, Genetic epidemiology.

[10]  E. Eichler,et al.  Systematic assessment of copy number variant detection via genome-wide SNP genotyping , 2008, Nature Genetics.

[11]  N. Malats,et al.  NAT2 slow acetylation, GSTM1 null genotype, and risk of bladder cancer: results from the Spanish Bladder Cancer Study and meta-analyses , 2005, The Lancet.

[12]  Xavier Estivill,et al.  Accounting for uncertainty when assessing association between copy number and disease: a latent class model , 2009, BMC Bioinformatics.

[13]  David B. Goldstein,et al.  A Genome-Wide Investigation of SNPs and CNVs in Schizophrenia , 2009, PLoS genetics.

[14]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[15]  Christopher Yau,et al.  Comparing CNV detection methods for SNP arrays. , 2009, Briefings in functional genomics & proteomics.

[16]  C. Yau,et al.  QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data , 2007, Nucleic acids research.

[17]  Ryan E. Mills,et al.  Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing , 2010, Nature Genetics.

[18]  Thomas W. Mühleisen,et al.  Large recurrent microdeletions associated with schizophrenia , 2008, Nature.

[19]  Pär Stattin,et al.  Association of a germ-line copy number variation at 2p24.3 and risk for aggressive prostate cancer. , 2009, Cancer research.

[20]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[21]  R. Redon,et al.  Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes , 2007, Science.

[22]  Antonio Ortega,et al.  Sparse representation and Bayesian detection of genome copy number alterations from microarray data , 2008, Bioinform..

[23]  R. Ophoff,et al.  Detection, imputation, and association analysis of small deletions and null alleles on oligonucleotide arrays. , 2008, American journal of human genetics.

[24]  Lude Franke,et al.  Copy-number variation in sporadic amyotrophic lateral sclerosis: a genome-wide screen , 2008, The Lancet Neurology.

[25]  Iuliana Ionita-Laza,et al.  Genetic association analysis of copy-number variation (CNV) in human disease pathogenesis. , 2009, Genomics.

[26]  Christian R. Marshall,et al.  Copy number variations and risk for schizophrenia in 22q11.2 deletion syndrome , 2008, Human molecular genetics.

[27]  D. Zwijnenburg,et al.  Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification. , 2002, Nucleic acids research.

[28]  P. Visscher,et al.  Rare chromosomal deletions and duplications increase risk of schizophrenia , 2008, Nature.

[29]  D. Pinto,et al.  Structural variation of chromosomes in autism spectrum disorder. , 2008, American journal of human genetics.

[30]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[31]  R. Ophoff,et al.  Analysis of genome-wide copy number variation in Irish and Dutch ALS populations. , 2008, Human molecular genetics.

[32]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[33]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[34]  Yan Guo,et al.  Genome-wide association study suggested copy number variation may be associated with body mass index in the Chinese population , 2009, Journal of Human Genetics.

[35]  Alberto Piazza,et al.  Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants , 2009, Nature Genetics.

[36]  Yong-shu He,et al.  [Structural variation in the human genome]. , 2009, Yi chuan = Hereditas.

[37]  Robert T. Schultz,et al.  Autism genome-wide copy number variation reveals ubiquitin and neuronal genes , 2009, Nature.

[38]  Joshua M. Korn,et al.  Association between microdeletion and microduplication at 16p11.2 and autism. , 2008, The New England journal of medicine.

[39]  Joshua M. Korn,et al.  De Novo Copy Number Variants Identify New Genes and Loci in Isolated, Sporadic Tetralogy of Fallot , 2009, Nature Genetics.

[40]  Seang-Mei Saw,et al.  Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays , 2010, Nucleic acids research.

[41]  Alexander Eckehart Urban,et al.  in the human genome Systematic prediction and validation of breakpoints associated with copy-number variants , 2007 .

[42]  C. Yau,et al.  CNV discovery using SNP genotyping arrays , 2009, Cytogenetic and Genome Research.

[43]  Sharon J. Diskin,et al.  Copy number variation at 1q21.1 associated with neuroblastoma , 2009, Nature.

[44]  A. Singleton,et al.  Rare Structural Variants Disrupt Multiple Genes in Neurodevelopmental Pathways in Schizophrenia , 2008, Science.

[45]  Tomas W. Fitzgerald,et al.  A robust statistical method for case-control association testing with copy number variation , 2008, Nature Genetics.

[46]  L. Armengol,et al.  Association of common copy number variants at the glutathione S-transferase genes and rare novel genomic changes with schizophrenia , 2010, Molecular Psychiatry.

[47]  Sonja W. Scholz,et al.  Structural genomic variation in ischemic stroke , 2008, Neurogenetics.

[48]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[49]  James Hadfield,et al.  The pitfalls of platform comparison: DNA copy number array technologies assessed , 2009, BMC Genomics.

[50]  A. Zlotta NAT2 Slow Acetylation, GSTM1 Null Genotype, and Risk of Bladder Cancer: Results from the Spanish Bladder Cancer Study and Meta-Analyses , 2006 .

[51]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[52]  H. Shin,et al.  Identification of SNP markers for common CNV regions and association analysis of risk of subarachnoid aneurysmal hemorrhage in Japanese population. , 2008, Biochemical and biophysical research communications.

[53]  Yan Guo,et al.  Genome-wide copy-number-variation study identified a susceptibility gene, UGT2B17, for osteoporosis. , 2008, American journal of human genetics.