A heavy-tailed empirical Bayes method for replicated microarray data

DNA microarray has been recognized as being an important tool for studying the expression of thousands of genes simultaneously. These experiments allow us to compare two different samples of cDNA obtained under different conditions. A novel method for the analysis of replicated microarray experiments based upon the modelling of gene expression distribution as a mixture of @a-stable distributions is presented. Some features of the distribution of gene expression, such as Pareto tails and the fact that the variance of any given array increases concomitantly with an increase in the number of genes studied, suggest the possibility of modelling gene expression distribution on the basis of @a-stable density. The proposed methodology uses very well known properties of @a-stable distribution, such as the scale mixture of normals. A Bayesian log-posterior odds is calculated, which allows us to decide whether a gene is expressed differentially or not. The proposed methodology is illustrated using simulated and experimental data and the results are compared with other existing statistical approaches. The proposed heavy-tail model improves the performance of other distributions and is easily applicable to microarray gene data, specially if the dataset contains outliers or presents high variance between replicates.

[1]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[2]  Rachel B. Brem,et al.  Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors , 2003, Nature Genetics.

[3]  M. Steel,et al.  BAYESIAN REGRESSION ANALYSIS WITH SCALE MIXTURES OF NORMALS , 2000, Econometric Theory.

[4]  C. Mallows,et al.  A Method for Simulating Stable Random Variables , 1976 .

[5]  J. L. Nolan,et al.  Numerical calculation of stable densities and distribution functions: Heavy tails and highly volatil , 1997 .

[6]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[7]  Laurent Zimmerli,et al.  Transcriptome analysis of Arabidopsis colonized by a plant-growth promoting rhizobacterium reveals a general effect on disease resistance. , 2003, The Plant journal : for cell and molecular biology.

[8]  R. Gottardo,et al.  Statistical analysis of microarray data: a Bayesian approach. , 2003, Biostatistics.

[9]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[10]  M. Taqqu,et al.  Stable Non-Gaussian Random Processes : Stochastic Models with Infinite Variance , 1995 .

[11]  Jerry Li,et al.  Within the fold: assessing differential expression measures and reproducibility in microarray assays , 2002, Genome Biology.

[12]  Ercan E. Kuruoglu,et al.  Density parameter estimation of skewed α-stable distributions , 2001, IEEE Trans. Signal Process..

[13]  C. L. Nikias,et al.  Signal processing with alpha-stable distributions and applications , 1995 .

[14]  Vladimir A. Kuznetsov,et al.  Distribution Associated with Stochastic Processes of Gene Expression in a Single Eukaryotic Cell , 2001, EURASIP J. Adv. Signal Process..

[15]  Chrysostomos L. Nikias,et al.  Maximum-likelihood symmetric α-stable parameter estimation , 1999, IEEE Trans. Signal Process..

[16]  E. Purdom,et al.  Statistical Applications in Genetics and Molecular Biology Error Distribution for Gene Expression Data , 2011 .

[17]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[18]  S. Godsill,et al.  Bayesian inference for time series with heavy-tailed symmetric α-stable noise processes , 1999 .

[19]  Magnus Rattray,et al.  Making sense of microarray data distributions , 2002, Bioinform..

[20]  Diego Salas-Gonzalez,et al.  Estimation of Mixtures of Symmetric Alpha Stable Distributions With an Unknown Number of Components , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[21]  P. Müller,et al.  A Bayesian mixture model for differential gene expression , 2005 .

[22]  Zhaozhi Fan,et al.  Parameter Estimation of Stable Distributions , 2006 .

[23]  Gonzalo R. Arce,et al.  Normalization of Cdna Microarray Data Based on Least Absolute Deviation Regression , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[24]  Chris A. Glasbey,et al.  Statistical estimation of gene expression using multiple laser scans of microarrays , 2006, Bioinform..

[25]  Darlene R Goldstein,et al.  A Laplace mixture model for identification of differential expression in microarray experiments. , 2006, Biostatistics.