Mixed modeling and sample size calculations for identifying housekeeping genes

Normalization of gene expression data using internal control genes that have biologically stable expression levels is an important process for analyzing reverse transcription polymerase chain reaction data. We propose a three-way linear mixed-effects model to select optimal housekeeping genes. The mixed-effects model can accommodate multiple continuous and/or categorical variables with sample random effects, gene fixed effects, systematic effects, and gene by systematic effect interactions. We propose using the intraclass correlation coefficient among gene expression levels as the stability measure to select housekeeping genes that have low within-sample variation. Global hypothesis testing is proposed to ensure that selected housekeeping genes are free of systematic effects or gene by systematic effect interactions. A gene combination with the highest lower bound of 95% confidence interval for intraclass correlation coefficient and no significant systematic effects is selected for normalization. Sample size calculation based on the estimation accuracy of the stability measure is offered to help practitioners design experiments to identify housekeeping genes. We compare our methods with geNorm and NormFinder by using three case studies. A free software package written in SAS (Cary, NC, U.S.A.) is available at http://d.web.umkc.edu/daih under software tab.

[1]  A. Butte,et al.  Further defining housekeeping, or "maintenance," genes Focus on "A compendium of gene expression in normal human tissues". , 2001, Physiological genomics.

[2]  M. Pfaffl,et al.  Determination of stable housekeeping genes, differentially regulated target genes and sample integrity: BestKeeper – Excel-based tool using pair-wise correlations , 2004, Biotechnology Letters.

[3]  M. Geletu,et al.  Housekeeping genes; expression levels may change with density of cultured cells. , 2010, Journal of immunological methods.

[4]  R. Rabin,et al.  Systematic method for determining an ideal housekeeping gene for real-time PCR analysis. , 2008, Journal of biomolecular techniques : JBT.

[5]  John Quackenbush,et al.  Data-driven normalization strategies for high-throughput quantitative RT-PCR , 2009, BMC Bioinformatics.

[6]  K. Livak,et al.  Real time quantitative PCR. , 1996, Genome research.

[7]  Ziyad Mahfoud,et al.  What Is an Intracluster Correlation Coefficient? Crucial Concepts for Primary Care Researchers , 2004, The Annals of Family Medicine.

[8]  D. Bonett Sample size requirements for estimating intraclass correlations with desired precision , 2002, Statistics in medicine.

[9]  C. Andersen,et al.  Sets Normalization , Applied to Bladder and Colon Cancer Data Estimation Approach to Identify Genes Suited for Transcription-PCR Data : A Model-Based Variance Normalization of Real-Time Quantitative Reverse , 2004 .

[10]  Statistical Selection of Maintenance Genes for Normalization of Gene Expressions , 2006, Statistical applications in genetics and molecular biology.

[11]  Florian Haller,et al.  Equivalence test in quantitative reverse transcription polymerase chain reaction: confirmation of reference genes suitable for normalization. , 2004, Analytical biochemistry.

[12]  K. McGraw,et al.  Forming inferences about some intraclass correlation coefficients. , 1996 .

[13]  G. Stephanopoulos,et al.  A compendium of gene expression in normal human tissues. , 2001, Physiological genomics.

[14]  Mario Pazzagli,et al.  Quantitative real-time reverse transcription polymerase chain reaction: normalization to rRNA or single housekeeping genes is inappropriate for human tissue biopsies. , 2002, Analytical biochemistry.

[15]  C. Heid,et al.  A novel method for real time quantitative RT-PCR. , 1996, Genome research.

[16]  Tomas Hruz,et al.  RefGenes: identification of reliable and condition specific reference genes for RT-qPCR data normalization , 2011, BMC Genomics.

[17]  Ove Hoegh-Guldberg,et al.  Analytical approach for selecting normalizing genes from a cDNA microarray platform to be used in q-RT-PCR assays: a cnidarian case study. , 2008, Journal of biochemical and biophysical methods.

[18]  F. Speleman,et al.  Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes , 2002, Genome Biology.