Bayesian Normalization and Identification for Differential Gene Expression Data

Commonly accepted intensity-dependent normalization in spotted microarray studies takes account of measurement errors in the differential expression ratio but ignores measurement errors in the total intensity, although the definitions imply the same measurement error components are involved in both statistics. Furthermore, identification of differentially expressed genes is usually considered separately following normalization, which is statistically problematic. By incorporating the measurement errors in both total intensities and differential expression ratios, we propose a measurement-error model for intensity-dependent normalization and identification of differentially expressed genes. This model is also flexible enough to incorporate intra-array and inter-array effects. A Bayesian framework is proposed for the analysis of the proposed measurement-error model to avoid the potential risk of using the common two-step procedure. We also propose a Bayesian identification of differentially expressed genes to control the false discovery rate instead of the ad hoc thresholding of the posterior odds ratio. The simulation study and an application to real microarray data demonstrate promising results.

[1]  C. Li,et al.  Feature extraction and normalization algorithms for high‐density oligonucleotide gene expression array data , 2001, Journal of cellular biochemistry. Supplement.

[2]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[3]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[4]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[5]  W. Gilks,et al.  Adaptive Rejection Metropolis Sampling Within Gibbs Sampling , 1995 .

[6]  Bayesian Smoothing and Regression Splines for MeasurementError , 2000 .

[7]  Alan J. Miller,et al.  Subset Selection in Regression , 1991 .

[8]  T. Kepler,et al.  Normalization and analysis of DNA microarray data by self-consistency and local regression , 2002, Genome Biology.

[9]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[10]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[11]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[12]  A. Khodursky,et al.  Functional Genomics: Methods And Protocols , 2007 .

[13]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[14]  Terence P. Speed,et al.  Normalization for cDNA microarry data , 2001, SPIE BiOS.

[15]  D. Lindley A STATISTICAL PARADOX , 1957 .

[16]  Scott M. Berry,et al.  Bayesian Smoothing and Regression Splines for Measurement Error Problems , 2002 .

[17]  Simon Lin,et al.  Methods of microarray data analysis III , 2002 .

[18]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[19]  G. Martin,et al.  Partial resistance of tomato to Phytophthora infestans is not dependent upon ethylene, jasmonic acid, or salicylic acid signaling pathways. , 2003, Molecular plant-microbe interactions : MPMI.

[20]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[21]  Alan J. Miller Subset Selection in Regression , 1992 .

[22]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[23]  Robert Tibshirani,et al.  Statistical methods for identifying differentially expressed genes in DNA microarrays. , 2003, Methods in molecular biology.

[24]  Jin Hyun Park,et al.  Normalization for cDNA Microarray Data on the oral cancer , 2002 .

[25]  T. Speed,et al.  Statistical issues in cDNA microarray data analysis. , 2003, Methods in molecular biology.

[26]  T. Zhu,et al.  Microarray analysis of the transcriptome as a stepping stone towards understanding biological systems: practical considerations and perspectives. , 2006, The Plant journal : for cell and molecular biology.

[27]  Kimberly F. Johnson Methods of Microarray Data Analysis , 2002, Springer US.

[28]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[29]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[30]  M. Oh,et al.  Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. , 2001, Nucleic acids research.

[31]  Terence P. Speed,et al.  Comparison of Methods for Image Analysis on cDNA Microarray Data , 2002 .

[32]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[33]  J. Rice Convergence rates for partially splined models , 1986 .

[34]  S. B. Goodwin,et al.  Re-emergence of Potato and Tomato Late Blight in the United States. , 1997, Plant disease.

[35]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[36]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[37]  P. Speckman Kernel smoothing in partial linear models , 1988 .

[38]  G. Churchill,et al.  Statistical design and the analysis of gene expression microarray data. , 2001, Genetical research.

[39]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[40]  James P. LeSage,et al.  A Mixture-Model Approach to Combining , 1992 .

[41]  Sylvia Richardson,et al.  Bayesian Hierarchical Model for Identifying Changes in Gene Expression from Microarray Experiments , 2002, J. Comput. Biol..

[42]  Wen-Harn Pan,et al.  Using endophenotypes for pathway clusters to map complex disease genes , 2006, Genetic epidemiology.

[43]  J. Cherry,et al.  Iterative linear regression by sector: renormalization of cDNA microarray data and cluster analysis weighted by cross homology , 2001 .

[44]  W. van Dooijeweert,et al.  Does basal PR gene expression in Solanum species contribute to non-specific resistance to Phytophthora infestans? , 2000 .

[45]  J. Michael Cherry,et al.  Iterative Linear Regresssion by Sector , 2002 .

[46]  T. Speed,et al.  Design issues for cDNA microarray experiments , 2002, Nature Reviews Genetics.

[47]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[48]  W. Pan,et al.  How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach , 2002, Genome Biology.

[49]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..