Widespread existence of uncorrelated probe intensities from within the same probeset on Affymetrix GeneChips

We have developed a computational pipeline to analyse large surveys of Affymetrix GeneChips, for example NCBI's Gene Expression Omnibus. GEO samples data for many organisms, tissues and phenotypes. Because of this experimental diversity, any observed correlations between probe intensities can be associated either with biology that is robust, such as common co-expression, or with systematic biases associated with the GeneChip technology. Our bioinformatics pipeline integrates the mapping of probes to exons, quality control checks on each GeneChip which identifies flaws in hybridization quality, and the mining of correlations in intensities between groups of probes. The output from our pipeline has enabled us to identify systematic biases in GeneChip data. We are also able to use the pipeline as a discovery tool for biology. We have discovered that in the majority of cases, Affymetrix probesets on Human GeneChips do not measure one unique block of transcription. Instead we see numerous examples of outlier probes. Our study has also identified that in a number of probesets the mismatch probes are an informative diagnostic of expression, rather than providing a measure of background contamination. We report evidence for systematic biases in GeneChip technology associated with probe-probe interactions. We also see signatures associated with post-transcriptional processing of RNA, such as alternative polyadenylation.

[1]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[2]  Li Zhang,et al.  Short oligonucleotide probes containing G-stacks display abnormal binding affinity on Affymetrix microarrays , 2007, Bioinform..

[3]  W. Langdon,et al.  G-spots cause incorrect expression measurement in Affymetrix microarrays , 2008, BMC Genomics.

[4]  Graham J. G. Upton,et al.  Oligonucleotide arrays: information from replication and spatial structure , 2005, Bioinform..

[5]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[6]  Jennifer W. Weller,et al.  AffyMAPSDetector: a software tool to characterize Affymetrix GeneChip™ expression arrays with respect to SNPs , 2007, BMC Bioinformatics.

[7]  Rafael A. Irizarry,et al.  Comparison of Affymetrix GeneChip expression measures , 2006, Bioinform..

[8]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[9]  Alessandro Guffanti,et al.  Splicy: a web-based tool for the prediction of possible alternative splicing events from Affymetrix probeset data , 2007, BMC Bioinformatics.

[10]  Maria A Stalteri,et al.  Give me shelter: the global housing crisis. , 2003, BMC Bioinformatics.

[11]  Michal J. Okoniewski,et al.  Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations , 2006, BMC Bioinformatics.

[12]  William B. Langdon,et al.  A Survey of Spatial Defects in Homo Sapiens Affymetrix GeneChips , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  R. Alberts,et al.  Sequence Polymorphisms Cause Many False cis eQTLs , 2007, PloS one.

[14]  J. Warrington,et al.  The affymetrix GeneChip platform: an overview. , 2006, Methods in enzymology.

[15]  William B. Langdon,et al.  An overview of image-processing methods for Affymetrix GeneChips , 2007, Briefings Bioinform..

[16]  Jun Lu,et al.  Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: High-resolution annotation for microarrays , 2007, BMC Bioinform..

[17]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[18]  Andrew P. Harrison,et al.  Establishing a major cause of discrepancy in the calibration of Affymetrix GeneChips , 2007, BMC Bioinformatics.