SUMMARY Motivation: Insufficient reliability of expression measurements is key problem facing microarray experiments. The problem could originate from poor gene identification by the probe sequences, whose design may not consider the actual complexity of the human genome. Results: We re-estimated genome localization of the Affymetrix U133A and U133B GeneChip (initial) target sequences. We matched these sequences to gene and transcripts in the human genome. This resulted in the significant redefinition of specificity and uniqueness of more than 2500 GeneChip probesets. Among the rest target sequences, approximately one quarter overlapped with interspersed repeats that could cause cross- hybridization signals and errors in expression measurements. To test that hypothesis, we compared GeneChip microarray data from large groups of breast cancer patients differed by aggressiveness of tumor growth. In particular, for low- and high- aggressive tumors, we demonstrated that among the set of differentially expressed genes the probesets with of repeat-overlapped target sequences statistically significant underrepresented in compare to the probesets of repeat-free target sequences. In addition, 407 Affymetrix target sequences were incorrectly oriented relative to the genes they purportedly represented (anti-sense transcripts). Surprisingly, a large fraction of these "erroneous" sequences can be significantly associated with important regulatory biological processes, molecular functions and pathways. The all defined categories of probe sequences have been annotated in our local Affy Probes Mapping and Annotation (APMA) database. Our results allow us to re-identify many targets used in a microarray experiment and carry out biological classification of the anti-sense transcripts.
[1]
R. Tibshirani,et al.
Significance analysis of microarrays applied to the ionizing radiation response
,
2001,
Proceedings of the National Academy of Sciences of the United States of America.
[2]
P. Hall,et al.
An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival.
,
2005,
Proceedings of the National Academy of Sciences of the United States of America.
[3]
Brad T. Sherman,et al.
DAVID: Database for Annotation, Visualization, and Integrated Discovery
,
2003,
Genome Biology.
[4]
Zoltan Szallasi,et al.
Increased measurement accuracy for sequence-verified microarray probes.
,
2004,
Physiological genomics.
[5]
S. Enkemann,et al.
A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array
,
2005,
Nucleic acids research.