PANP - a New Method of Gene Detection on Oligonucleotide Expression Arrays

The method currently most used for probeset detection calls on Affymetrix GeneChipreg Human Genome Arrays is provided as part of the MAS5 software. The MAS method uses Wilcoxon statistics for determining presence-absence (MAS-P/A) calls. However, MAS-P/A is only usable with MAS5 processing, which requires the use of both perfect match (PM) and mismatch (MM) probe data in order to call the resulting probeset present or absent. A considerable amount of recent research has convincingly shown that using MM data in gene expression analysis may be problematic. The RMA method, which uses PM data only, is one method that has been developed in response to this. However, there is no publicly available method that works with PM-only expression data to establish presence or absence of genes from the probesets in microarray data. It seems desirable to decouple the method used to generate gene expression values from the method used to make gene detection calls. We have therefore developed a statistical method in R, called presence-absence calls with negative probesets (PANP) which uses sets of Affymetrix-reported probes with no known hybridization partners on two chip sets: HG-U133A and HG-U133 Plus 2.0. PANP allows the use of any Affymetrix microarray data preprocessing method to generate expression values, including PM-only methods as well as PM and MM methods. We present our results on PANP and its performance using the set of 28 HG-U133A chips from a published Affymetrix Latin squares spike-in dataset as well as an internal TaqMan-validated human tissue dataset on the HG-U133 Plus 2.0 chipsets. We And that using these datasets, PANP out-performs the MAS-PA method in several metrics of accuracy and precision using a variety of preprocessing methods: RMA, GCRMA, and even MAS5 itself. PANP out-performs MAS-P/A in probeset detection across a full range of concentrations, especially with low concentration transcripts. An R software package has been prepared for PANP and is available in R as part of the Bioconductor package release at http://www.bioconductor.org.

[1]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[2]  Rafael A. Irizarry,et al.  A statistical framework for the analysis of microarray probe-level data , 2007, 0712.2115.

[3]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[4]  Cheng Li,et al.  Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application , 2001, Genome Biology.

[5]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[6]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[7]  Rafael A. Irizarry,et al.  Comparison of Affymetrix GeneChip expression measures , 2006, Bioinform..

[8]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[9]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[10]  Jianzhi Zhang,et al.  Toward a Molecular Understanding of Pleiotropy , 2006, Genetics.

[11]  Stephan Preibisch,et al.  Specific and nonspecific hybridization of oligonucleotide probes on microarrays. , 2004, Biophysical journal.

[12]  Zhijin Wu,et al.  Preprocessing of oligonucleotide array data , 2004, Nature Biotechnology.

[13]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[14]  Felix Naef,et al.  Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Yi Xing,et al.  Exon arrays provide accurate assessments of gene expression , 2007, Genome Biology.

[16]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[17]  Rudolph S. Parrish,et al.  BMC Bioinformatics BioMed Central Research article Sources of variation in Affymetrix microarray experiments , 2005 .