Quality Assessment of the Affymetrix U133A&B Probesets by Target Sequence Mapping and Expression Data Analysis

Careful analysis of microarray probe design should be an obligatory component of MicroArray Quality Control (MACQ) project [Patterson et al., 2006; Shi et al., 2006] initiated by the FDA (USA) in order to provide quality control tools to researchers of gene expression profiles and to translate the microarray technology from bench to bedside. The identification and filtering of unreliable probesets are important preprocessing steps before analysis of microarray data. These steps may result in an essential improvement in the selection of differentially expressed genes, gene clustering and construction of co-regulatory expression networks. We revised genome localization of the Affymetrix U133A&B GeneChip initial (target) probe sequences, and evaluated the impact of erroneous and poorly annotated target sequences on the quality of gene expression data. We found about 25% of Affymetrix target sequences overlapping with interspersed repeats that could cause cross-hybridization effects. In total, discrepancies in target sequence annotation account for up to approximately 30% of 44692 Affymetrix probesets. We introduce a novel quality control algorithm based on target sequence mapping onto genome and GeneChip expression data analysis. To validate the quality of probesets we used expression data from large, clinically and genetically distinct groups of breast cancers (249 samples). For the first time, we quantitatively evaluated the effect of repeats and other sources of inadequate probe design on the specificity, reliability and discrimination ability of Affymetrix probesets. We propose that only functionally reliable Affymetrix probesets that passed our quality control algorithm (approximately 86%) for gene expression analysis should be utilized. The target sequence annotation and filtering is available upon request.

[1]  Xiaoqiu Huang,et al.  Over 20% of human transcripts might form sense-antisense pairs. , 2004, Nucleic acids research.

[2]  P. Hall,et al.  An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Hui Sun Leong,et al.  ADAPT: a database of affymetrix probesets and transcripts , 2005, Bioinform..

[4]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  S. Batalov,et al.  Antisense Transcription in the Mammalian Transcriptome , 2005, Science.

[6]  Vladimir A. Kuznetsov,et al.  Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes , 2006 .

[7]  Vladimir A. Kuznetsov,et al.  A COMPREHENSIVE QUALITY ASSESSMENT OF THE AFFYMETRIX U133A&B PROBESETS BY AN INTEGRATIVE GENOMIC AND CLINICAL DATA ANALYSIS APPROACH , 2006 .

[8]  Joshy George,et al.  Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. , 2006, Cancer research.

[9]  V A Kuznetsov,et al.  GENOME-WIDE CO-EXPRESSION PATTERNS OF HUMAN CIS-ANTISENSE GENE PAIRS , 2006 .

[10]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[11]  Guoying Liu,et al.  NetAffx: Affymetrix probesets and annotations , 2003, Nucleic Acids Res..

[12]  P. Collins,et al.  Performance comparison of one-color and two-color platforms within the Microarray Quality Control (MAQC) project , 2006, Nature Biotechnology.

[13]  Luquan Wang,et al.  Genome wide in silico SNP-tumor association analysis , 2004, BMC Cancer.

[14]  W. Gerald,et al.  A Genome-Wide Screen for Promoter Methylation in Lung Cancer Identifies Novel Methylation Markers for Multiple Malignancies , 2006, PLoS medicine.

[15]  Boris Lenhard,et al.  Antisense Transcription in the Mammalian Transcriptome RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group) and the FANTOM Consortium , 2005 .

[16]  Vladimir A. Kuznetsov,et al.  Pareto-Gamma Statistic Reveals Global Rescaling in Transcriptomes of Low and High Aggressive Breast Cancer Phenotypes , 2006, PRIB.

[17]  X. Shirley Liu,et al.  Genome-wide in silico identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species , 2006, Nucleic acids research.

[18]  S. Enkemann,et al.  A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array , 2005, Nucleic acids research.

[19]  Chunlei Wu,et al.  Sequence dependence of cross-hybridization on short oligo microarrays , 2005, Nucleic acids research.

[20]  Michal J. Okoniewski,et al.  Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations , 2006, BMC Bioinformatics.

[21]  Erez Y. Levanon,et al.  Widespread occurrence of antisense transcription in the human genome , 2003, Nature Biotechnology.

[22]  Zoltan Szallasi,et al.  Increased measurement accuracy for sequence-verified microarray probes. , 2004, Physiological genomics.

[23]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[24]  Maria A Stalteri,et al.  Give me shelter: the global housing crisis. , 2003, BMC Bioinformatics.

[25]  Kenneth H Buetow,et al.  Detecting false expression signals in high-density oligonucleotide arrays by an in silico approach. , 2005, Genomics.

[26]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[27]  Steen Knudsen,et al.  Alternative mapping of probes to genes for Affymetrix chips , 2004, BMC Bioinformatics.