Empirical Bayes models for multiple probe type microarrays at the probe level

BackgroundWhen analyzing microarray data a primary objective is often to find differentially expressed genes. With empirical Bayes and penalized t-tests the sample variances are adjusted towards a global estimate, producing more stable results compared to ordinary t-tests. However, for Affymetrix type data a clear dependency between variability and intensity-level generally exists, even for logged intensities, most clearly for data at the probe level but also for probe-set summarizes such as the MAS5 expression index. As a consequence, adjustment towards a global estimate results in an intensity-level dependent false positive rate.ResultsWe propose two new methods for finding differentially expressed genes, Probe level Locally moderated Weighted median-t (PLW) and Locally Moderated Weighted-t (LMW). Both methods use an empirical Bayes model taking the dependency between variability and intensity-level into account. A global covariance matrix is also used allowing for differing variances between arrays as well as array-to-array correlations. PLW is specially designed for Affymetrix type arrays (or other multiple-probe arrays). Instead of making inference on probe-set summaries, comparisons are made separately for each perfect-match probe and are then summarized into one score for the probe-set.ConclusionThe proposed methods are compared to 14 existing methods using five spike-in data sets. For RMA and GCRMA processed data, PLW has the most accurate ranking of regulated genes in four out of the five data sets, and LMW consistently performs better than all examined moderated t-tests when used on RMA, GCRMA, and MAS5 expression indexes.

[1]  Petter Mostad,et al.  Improved Covariance Matrix Estimators for Weighted Analysis of Microarray Data , 2007, J. Comput. Biol..

[2]  Vladimir Svetnik,et al.  STATISTICAL ANALYSIS OF HIGH DENSITY OLIGONUCLEOTIDE ARRAYS: A SAFER APPROACH , 2001 .

[3]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  David M. Rocke,et al.  Approximate Variance-stabilizing Transformations for Gene-expression Microarray Data , 2003, Bioinform..

[5]  Sandya Liyanarachchi,et al.  A high performance test of differential gene expression for oligonucleotide arrays , 2003, Genome Biology.

[6]  Anne-Mette K. Hein,et al.  BGX: a fully Bayesian integrated approach to the analysis of Affymetrix GeneChip data. , 2005, Biostatistics.

[7]  Erik Kristiansson,et al.  Weighted Analysis of Paired Microarray Experiments , 2005, Statistical applications in genetics and molecular biology.

[8]  David M. Rocke,et al.  Transformation and normalization of oligonucleotide microarray data , 2003, Bioinform..

[9]  David M. Rocke,et al.  Estimation of Transformation Parameters for Microarray Data , 2003, Bioinform..

[10]  Erik Kristiansson,et al.  Quality Optimised Analysis of General Paired Microarray Experiments , 2006, Statistical applications in genetics and molecular biology.

[11]  A. Hess,et al.  Fisher's combined p-value for detecting differentially expressed genes using Affymetrix expression arrays , 2007, BMC Genomics.

[12]  Paul A Lyons,et al.  Combining mouse congenic strains and microarray gene expression analyses to study a complex trait: the NOD model of type 1 diabetes. , 2002, Genome research.

[13]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[14]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[15]  Korbinian Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology , 2005 .

[16]  Erik Kristiansson,et al.  BMC Bioinformatics BioMed Central Methodology article Weighted analysis of general microarray experiments , 2007 .

[17]  Jeffrey C Miecznikowski,et al.  Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent , 2007, BMC Genomics.

[18]  Jae K. Lee,et al.  Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays , 2003, Bioinform..

[19]  F. Wright,et al.  Assessing Differential Gene Expression with Small Sample Sizes in Oligonucleotide Arrays Using a Mean‐Variance Model , 2007, Biometrics.

[20]  Neil D. Lawrence,et al.  A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips , 2005, Bioinform..

[21]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[22]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[23]  Magnus Åstrand,et al.  Contrast Normalization of Oligonucleotide Arrays , 2003, J. Comput. Biol..

[24]  Raphael A Nemenoff,et al.  Tumorigenesis and Neoplastic Progression Analysis of Orthologous Gene Expression between Human Pulmonary Adenocarcinoma and a Carcinogen-Induced Murine Model , 2010 .

[25]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  B. Celli,et al.  Gene expression profiling of human lung tissue from smokers with severe emphysema. , 2004, American journal of respiratory cell and molecular biology.

[28]  G. Garcı́a-Cardeña,et al.  Improving the statistical detection of regulated genes from microarray data using intensity-based variance estimation , 2004, BMC Genomics.

[29]  Neil D. Lawrence,et al.  Probe-level measurement error improves accuracy in detecting differential gene expression , 2006, Bioinform..

[30]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[31]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[32]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[33]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[34]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[35]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[36]  Zhijin Wu,et al.  Feature-level exploration of a published Affymetrix GeneChip control dataset , 2006, Genome Biology.

[37]  Mario Medvedovic,et al.  Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments , 2006, BMC Bioinformatics.