Statistical implications of pooling RNA samples for microarray experiments

BackgroundMicroarray technology has become a very important tool for studying gene expression profiles under various conditions. Biologists often pool RNA samples extracted from different subjects onto a single microarray chip to help defray the cost of microarray experiments as well as to correct for the technical difficulty in getting sufficient RNA from a single subject. However, the statistical, technical and financial implications of pooling have not been explicitly investigated.ResultsModeling the resulting gene expression from sample pooling as a mixture of individual responses, we derived expressions for the experimental error and provided both upper and lower bounds for its value in terms of the variability among individuals and the number of RNA samples pooled. Using "virtual" pooling of data from real experiments and computer simulations, we investigated the statistical properties of RNA sample pooling. Our study reveals that pooling biological samples appropriately is statistically valid and efficient for microarray experiments. Furthermore, optimal pooling design(s) can be found to meet statistical requirements while minimizing total cost.ConclusionsAppropriate RNA pooling can provide equivalent power and improve efficiency and cost-effectiveness for microarray experiments with a modest increase in total number of subjects. Pooling schemes in terms of replicates of subjects and arrays can be compared before experiments are conducted.

[1]  Eric P. Hoffman,et al.  Sources of variability and effect of experimental approach on expression profiling data interpretation , 2002, BMC Bioinformatics.

[2]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[3]  T. Foster,et al.  Gene Microarrays in Hippocampal Aging: Statistical Profiling Identifies Novel Processes Correlated with Cognitive Impairment , 2003, The Journal of Neuroscience.

[4]  Andrew I. Brooks,et al.  Computational method for reducing variance with Affymetrix microarrays , 2002, BMC Bioinformatics.

[5]  W. Pan,et al.  How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach , 2002, Genome Biology.

[6]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.

[7]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[8]  G. W. Snedecor Statistical Methods , 1964 .

[9]  A. Galecki,et al.  Interpretation, design, and analysis of gene array expression experiments. , 2001, The journals of gerontology. Series A, Biological sciences and medical sciences.

[10]  David M. Rocke,et al.  A Two-Component Model for Measurement Error in Analytical Chemistry , 1995 .

[11]  Alan Cantor,et al.  Osteopontin identified as lead marker of colon cancer progression, using pooled sample expression profiling. , 2002, Journal of the National Cancer Institute.

[12]  David B. Goldstein,et al.  Genome-Wide Transcript Profiles in Aging and Calorically Restricted Drosophila melanogaster , 2002, Current Biology.

[13]  R. Nadon,et al.  Statistical issues with microarrays: processing and analysis. , 2002, Trends in genetics : TIG.