A probabilistic framework for microarray data analysis: fundamental probability models and statistical inference.

Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays.

[1]  Ramon Gonzalez,et al.  DNA Microarrays: Experimental Issues, Data Analysis, and Application to Bacterial Systems , 2004, Biotechnology progress.

[2]  Robert W. Reid,et al.  Determining gene expression on a single pair of microarrays , 2008, BMC Bioinformatics.

[3]  Debashis Ghosh,et al.  Mixture models for assessing differential expression in complex tissues using microarray data , 2004, Bioinform..

[4]  Yuan Ji,et al.  Applications of beta-mixture models in bioinformatics , 2005, Bioinform..

[5]  Joshua Merritt,et al.  Digital quantitative measurements of gene expression , 2004, Biotechnology and bioengineering.

[6]  Martin Vingron,et al.  Gaussian mixture density estimation applied to microarray data , 2003 .

[7]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[8]  Scott N Peterson,et al.  Analysis of Deinococcus radiodurans's Transcriptional Response to Ionizing Radiation and Desiccation Reveals Novel Proteins That Contribute to Extreme Radioresistance , 2004, Genetics.

[9]  Ernst Wit,et al.  Statistics for Microarrays : Design, Analysis and Inference , 2004 .

[10]  Mark D. Semon,et al.  POSTUSE REVIEW: An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements , 1982 .

[11]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..

[12]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[14]  R. Parker,et al.  Identifying important results from multiple statistical tests. , 1988, Statistics in medicine.

[15]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[16]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[17]  John Aach,et al.  Measuring absolute expression with microarrays with a calibrated reference sample and an extended signal intensity range , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[19]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[20]  X. Cui,et al.  Statistical tests for differential expression in cDNA microarray experiments , 2003, Genome Biology.

[21]  Paola Sebastiani,et al.  Statistical Challenges in Functional Genomics , 2003 .

[22]  M. Evans Statistical Distributions , 2000 .

[23]  Krishnarao Appasani,et al.  Experimental Design for Gene Expression Analysis , 2007, Bioarrays.

[24]  R. Nadon,et al.  Statistical issues with microarrays: processing and analysis. , 2002, Trends in genetics : TIG.

[25]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[26]  Magnus Rattray,et al.  Making sense of microarray data distributions , 2002, Bioinform..

[27]  Rithy K. Roth,et al.  Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays , 2000, Nature Biotechnology.

[28]  M. Thattai,et al.  Intrinsic noise in gene regulatory networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[30]  W. Godwin Article in Press , 2000 .

[31]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[32]  B. Willis,et al.  Interaction of 2,2,6,6-tetramethyl-3,5-heptanedione with the Si(1 0 0)-2 × 1 surface: Scanning tunneling microscopy and density functional theory study , 2007 .

[33]  E. R. Cohen An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements , 1998 .

[34]  Kevin R. Coombes,et al.  Identifying Differentially Expressed Genes in cDNA Microarray Experiments , 2001, J. Comput. Biol..

[35]  Hans C. van Houwelingen,et al.  Microarray Data Analysis , 2004, Applied bioinformatics.

[36]  Debashis Ghosh,et al.  Mixture modelling of gene expression data from microarray experiments , 2002, Bioinform..

[37]  R. Shanmugam Quantifying Prior Opinion in Length-Biased Linear Mean Natural Exponential Family , 1992 .

[38]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[39]  Ertugrul M. Ozbudak,et al.  Regulation of noise in the expression of a single gene , 2002, Nature Genetics.

[40]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.