To the Editor:
In the November issue of the Journal, Slater et al. (2005) introduced a high-resolution method for the detection of chromosomal abnormalities using high-density synthetic oligonucleotide Affymetrix arrays containing 116,206 SNPs. The authors identified amplifications and deletions of different sizes (1.3–145.9 Mb) in patients by using SNP arrays in combination with the GeneChip Chromosome Copy Number Analysis Tool (CNAT), version 2.0 (Affymetrix). Comparative genomic hybridization and computational fosmid-end-mapping–based approaches have shown that large-scale chromosome copy-number polymorphisms (CNPs) substantially contribute to the genomic variation between normal human individuals (Iafrate et al. 2004; Sebat et al. 2004; Sharp et al. 2005; Tuzun et al. 2005). It has been proposed that CNPs might be associated with complex diseases, such as cancer, neurological disorders, autism, and obesity (Sebat et al. 2004; Check 2005).
Slater et al. (2005) suggested that it is highly likely that multiple SNPs cover CNP regions and could allow their detection. The algorithm (CNAT, version 2.0) that they used for the detection of chromosomal aberrations was developed using a reference set of 110 healthy individuals who also carry CNPs. Slater et al. proposed that the algorithm needs to be improved to detect CNPs. We suggest that an additional improvement of CNP detection should consider the selection criteria of SNPs for the array. The criteria used by Affymetrix consider Mendelian inheritance, Hardy-Weinberg equilibrium (HWE), genotyping accuracy, and reproducibility (Slater et al. 2005), which may lead to a selection of SNPs that is biased against CNP regions, and thus interferes with the detection of frequent CNPs. This limitation cannot be overcome with improvement of the algorithms. SNPs in CNP regions with frequent losses would lead to an accumulation of apparent Mendelian inheritance errors (e.g., if the genotypes of the parents are AA and B0 and the genotype of the child is A0) or deviations from HWE and thus would be rejected by the criteria. SNPs in frequently amplified genomic regions might produce genotype calls of reduced reproducibility (between homozygous and heterozygous calls, if an individual carries an “AAB” or “ABB” genotype) or might lead to Mendelian inheritance errors. An underrepresentation of SNPs in regions known to contain common CNPs will prevent the identification of these common CNPs, because information from multiple SNPs is required to establish a reliable detection.
To test this hypothesis, we determined the SNP coverage of the most-frequent CNP regions (frequency >0.20) published by Iafrate et al. (2004), Sebat et al. (2004), Tuzun et al. (2005), and Sharp et al. (2005) with SNPs on the Affymetrix GeneChip Mapping 100K Array set. Data of 82 CNP regions were retrieved from the Database of Genomic Variations (representing 12.8% of all CNPs in the database [Iafrate et al. 2004]), and the corresponding SNP data were retrieved from the University of California–Santa Cruz (UCSC) Genome Browser (see Web Resources). The mean intermarker distance (ID) of the Affymetrix 100K SNPs located within the borders of each investigated CNP region was determined. After the exact location of the CNPs was mapped, the mean ID was calculated by dividing the length of each CNP region by the number (plus 1) of 100K array SNPs located within the region. In the cases in which CNPs were not covered by any SNPs, the mean ID size corresponded to the CNP length. Of all analyzed CNP regions, 58.5% contained at least one known gene, and all investigated CNPs except one (chr2-cent-2p11.2) were located outside telomeric or centromeric regions. All investigated CNPs with detailed annotations are listed in an HTML file (online only).
Indeed, 81.7% of the investigated CNP regions had a mean ID larger than the overall mean ID of all SNPs on the array (23.6 kb), and 95.1% of the investigated CNP regions had a mean ID larger than the overall median ID of all SNPs on the array (8.5 kb) (table 1). We divided the CNPs into four groups according to their SNP coverage: 0 SNPs (52.4% of CNPs), 1–4 SNPs with mean ID >23.6 kb (33.0%), >4 SNPs with mean ID >23.6 (6.1%), and >4 SNPs with mean ID ⩽23.6 (8.5%) (table 1). Thus, only 14.6% of all investigated CNP regions were covered with >4 SNPs on the array and might be detectable, although half of them had a mean ID >23.6 kb. All other analyzed CNP regions (85.6%) were not covered with SNPs or were too sparsely covered with SNPs to achieve an appropriate detectability. The stratification of CNPs according to the different kinds of copy-number variation (loss, gain, or both) revealed that the majority of CNPs with losses (65.6%) or with both losses and gains (57.6%) were not covered by SNPs at all (fig. 1). Most of the CNPs with gains (64.7%) were covered with only 1–4 SNPs with a mean ID >23.6 kb (fig. 1).
Figure 1
Coverage of the most-frequent CNP regions (frequency >0.20) identified by Iafrate et al. (2004), Sebat et al. (2004), Sharp et al. (2005), and Tuzun et al. (2005) with SNPs on the Affymetrix GeneChip Mapping 100K Array set. The regions were divided ...
Table 1
Coverage of the 82 Investigated Most-Frequent CNPs (Frequency >0.20) with SNPs on the Affymetrix GeneChip Mapping 100K Array Set[Note]
Sharp et al. (2005) recently suggested that segmental duplications may be able to serve as catalysts for CNPs in the human genome. Segmental duplications themselves are enriched significantly more than fourfold within regions of CNP. Indeed, 82.9% of the frequent CNPs investigated in the present study were overlapping segmental duplications. Only five of the most frequent CNP regions investigated in this study (22q11.22, 22q11.21, 19p13.2, 15q14, and 14q32.33) were detected by more than one author group (Iafrate et al. 2004; Sebat et al. 2004; Sharp et al. 2005; Tuzun et al. 2005; data in the HTML file [online only]). This points to the still-unknown significance of the CNPs identified so far (Carter 2004).
Slater et al. (2005) suggested 400 kb as the mean length of CNPs, on the basis of the Database of Genomic Variations. We show here that the most-frequent CNPs (frequency >0.20) investigated in the present study had a mean length of 268 kb and a median length of 157 kb, respectively (table 1). Notably, the CNP regions not covered by SNPs at all were smaller in size (mean length 120 kb; median length 141 kb). However, considering that 91% of the genome is suggested to be within 100 kb of a SNP (Slater et al. 2005), the majority of CNPs should have been covered at least by one SNP on the array.
In conclusion, oligonucleotide-based SNP arrays have been shown to be an excellent tool for analyses of loss of heterozygosity and rare copy-number variation (e.g., Zhao et al. 2004), association studies (e.g., Hu et al. 2005), linkage studies (e.g., Sellick et al. 2005), resequencing applications in humans and other organisms (e.g., Cutler et al. 2001; Maitra et al. 2004; Zwick et al. 2005), and the detection of recombination hotspots (e.g., Wirtenberger et al. 2005). However, the applicability might be somewhat limited with regard to the analysis of frequent CNPs, because of the initial SNP selection. High-density tiling arrays might be an appropriate tool for this kind of analysis. Chip manufacturers may be able to change their SNP selection criteria and provide an updated chip-description file that includes information on the artificially masked SNPs that do not fulfill the selection criteria. But, until they do so, users of high-density SNP arrays in association studies of common diseases should be aware of this limitation.
[1]
Robert Henke,et al.
High-resolution identification of chromosomal abnormalities using oligonucleotide arrays containing 116,204 SNPs.
,
2005,
American journal of human genetics.
[2]
E. Check.
Human genome: Patchwork people
,
2005,
Nature.
[3]
M. Dyer,et al.
A high-density SNP genomewide linkage scan for chronic lymphocytic leukemia-susceptibility loci.
,
2005,
American journal of human genetics.
[4]
E. Eichler,et al.
Segmental duplications and copy-number variation in the human genome.
,
2005,
American journal of human genetics.
[5]
K. Hemminki,et al.
SNP microarray analysis for genome-wide detection of crossover regions
,
2005,
Human Genetics.
[6]
E. Eichler,et al.
Fine-scale structural variation of the human genome
,
2005,
Nature Genetics.
[7]
Nan Hu,et al.
Genome-wide association study in esophageal cancer using GeneChip mapping 10K array.
,
2005,
Cancer research.
[8]
D. Cutler,et al.
Microarray-based resequencing of multiple Bacillus anthracis isolates
,
2004,
Genome Biology.
[9]
L. Feuk,et al.
Detection of large-scale variation in the human genome
,
2004,
Nature Genetics.
[10]
N. Carter.
As normal as normal can be?
,
2004,
Nature Genetics.
[11]
Kenny Q. Ye,et al.
Large-Scale Copy Number Polymorphism in the Human Genome
,
2004,
Science.
[12]
Luc Girard,et al.
An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays.
,
2004,
Cancer research.
[13]
A. Chakravarti,et al.
The Human MitoChip: a high-throughput sequencing microarray for mitochondrial mutation detection.
,
2004,
Genome research.
[14]
A Chakravarti,et al.
High-throughput variation detection and genotyping using microarrays.
,
2001,
Genome research.