Comprehensive performance comparison of high-resolution array platforms for genome-wide Copy Number Variation (CNV) analysis in humans

BackgroundHigh-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data.ResultsThe arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4–489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0–86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters.ConclusionsHigh-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies.

[1]  Mark Gerstein,et al.  Genome-Wide Mapping of Copy Number Variation in Humans: Comparative Analysis of High Resolution Array Platforms , 2011, PloS one.

[2]  Julie V. Harness,et al.  Dynamic changes in the copy number of pluripotency and cell proliferation genes in human ESCs and iPSCs during reprogramming and time in culture. , 2011, Cell stem cell.

[3]  D. Ledbetter,et al.  Chromosomal microarray versus karyotyping for prenatal diagnosis. , 2012, The New England journal of medicine.

[4]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[5]  M. Gerstein,et al.  Child development and structural variation in the human genome. , 2013, Child development.

[6]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[7]  Riitta Lahesmaa,et al.  Copy number variation and selection during reprogramming to pluripotency , 2011, Nature.

[8]  References , 1971 .

[9]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[10]  Kali T. Witherspoon,et al.  Refining analyses of copy number variation identifies specific genes associated with developmental delay , 2014, Nature Genetics.

[11]  P. Stankiewicz,et al.  Combined array CGH plus SNP genome analyses in a single assay for optimized clinical testing , 2013, European Journal of Human Genetics.

[12]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[13]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[14]  Ryan Mills,et al.  Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants , 2011, Nature Biotechnology.

[15]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[16]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[17]  Qian Wang,et al.  A review of study designs and statistical methods for genomic epidemiology studies using next generation sequencing , 2015, Front. Genet..

[18]  A. Beaudet,et al.  The utility of chromosomal microarray analysis in developmental and behavioral pediatrics. , 2013, Child development.

[19]  J. Wiszniewska,et al.  Copy number and SNP arrays in clinical diagnostics. , 2011, Annual review of genomics and human genetics.

[20]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[21]  P. Stankiewicz,et al.  Comparison of chromosome analysis and chromosomal microarray analysis: what is the value of chromosome analysis in today’s genomic array era? , 2012, Genetics in Medicine.