Physically grounded approach for estimating gene expression from microarray data

High-throughput technologies, including gene-expression microarrays, hold great promise for the systems-level study of biological processes. Yet, challenges remain in comparing microarray data from different sources and extracting information about low-abundance transcripts. We demonstrate that these difficulties arise from limitations in the modeling of the data. We propose a physically motivated approach for estimating gene-expression levels from microarray data, an approach neglected in the microarray literature. We separately model the noises specific to sample amplification, hybridization, and fluorescence detection, combining these into a parsimonious description of the variability sources in a microarray experiment. We find that our model produces estimates of gene expression that are reproducible and unbiased. While the details of our model are specific to gene-expression microarrays, we argue that the physically grounded modeling approach we pursue is broadly applicable to other molecular biology technologies.

[1]  J. Schernthaner,et al.  Optimization of in vitro transcription and full-length cDNA synthesis using the T4 bacteriophage gene 32 protein. , 2005, Journal of biomolecular techniques : JBT.

[2]  Wolfgang Huber,et al.  Systematic analysis of T7 RNA polymerase based in vitro linear RNA amplification for use in microarray experiments , 2004, BMC Genomics.

[3]  K. Becker,et al.  Analysis of microarray data using Z score transformation. , 2003, The Journal of molecular diagnostics : JMD.

[4]  Magnus Rattray,et al.  Making sense of microarray data distributions , 2002, Bioinform..

[5]  John D. Storey A direct approach to false discovery rates , 2002 .

[6]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[7]  Phuong Chung,et al.  Small molecule activators of sirtuins extend Saccharomyces cerevisiae lifespan , 2003, Nature.

[8]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[9]  Heidi Ledford,et al.  The death of microarrays? , 2008, Nature.

[10]  J. Shendure The beginning of the end for microarrays? , 2008, Nature Methods.

[11]  J. Wood,et al.  Sirtuin activators mimic caloric restriction and delay ageing in metazoans , 2004, Nature.

[12]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[13]  M. Reuss,et al.  Kinetic modeling and simulation of in vitro transcription by phage T7 RNA polymerase. , 2001, Biotechnology and bioengineering.

[14]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[15]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[16]  T. Speed,et al.  Statistical issues in cDNA microarray data analysis. , 2003, Methods in molecular biology.

[17]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[18]  Terence P. Speed,et al.  Normalization for cDNA microarry data , 2001, SPIE BiOS.

[19]  P. Puigserver,et al.  Resveratrol improves health and survival of mice on a high-calorie diet , 2006, Nature.

[20]  Tetsuya Yomo,et al.  Universality and flexibility in gene expression from bacteria to human. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  F. Belzile,et al.  The protective role of silicon in the Arabidopsis–powdery mildew pathosystem , 2006, Proceedings of the National Academy of Sciences.

[22]  V. Kuznetsov,et al.  General statistics of stochastic process of gene expression in eukaryotic cells. , 2002, Genetics.