Statistical Approaches to Decreasing the Discrepancy of Non-detects in qPCR Data

Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. Despite extensive research in qPCR laboratory protocols, normalization, and statistical analysis, little attention has been given to qPCR non-detects – those reactions failing to produce a minimum amount of signal. While most current software replaces these non-detects with a value representing the limit of detection, recent work suggests that this introduces substantial bias in estimation of both absolute and differential expression. Recently developed single imputation procedures, while better than previously used methods, underestimate residual variance, which can lead to anti-conservative inference. We propose to treat non-detects as non-random missing data, model the missing data mechanism, and use this model to impute missing values or obtain direct estimates of relevant model parameters. To account for the uncertainty inherent in the imputation, we propose a multiple imputation procedure, which provides a set of plausible values for each non-detect. In the proposed modeling framework, there are three sources of uncertainty: parameter estimation, the missing data mechanism, and measurement error. All three sources of variability are incorporated in the multiple imputation and direct estimation algorithms. We demonstrate the applicability of these methods on three real qPCR data sets and perform an extensive simulation study to assess model sensitivity to misspecification of the missing data mechanism, to the number of replicates within the sample, and to the overall size of the data set. The proposed methods result in unbiased estimates of the model parameters; therefore, these approaches may be beneficial when estimating both absolute and differential gene expression. The developed methods are implemented in the R/Bioconductor package nondetects. The statistical methods introduced here reduce discrepancies in gene expression values derived from qPCR experiments, providing more confidence in generating scientific hypotheses and performing downstream analysis.

[1]  Larisa M Haupt,et al.  Locked nucleic acid (LNA) single nucleotide polymorphism (SNP) genotype analysis and validation using real-time PCR. , 2004, Nucleic acids research.

[2]  T. B. Morrison,et al.  Quantification of low-copy transcripts by continuous SYBR Green I monitoring during amplification. , 1998, BioTechniques.

[3]  R. Birtles,et al.  Investigation of human haemotropic Mycoplasma infections using a novel generic haemoplasma qPCR assay on blood samples and blood smears , 2010, Journal of medical microbiology.

[4]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[5]  K. Gruden,et al.  NAIMA: target amplification strategy allowing quantitative on-chip detection of GMOs , 2008, Nucleic acids research.

[6]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[7]  D. Stirling,et al.  A short history of the polymerase chain reaction. , 2003, Methods in molecular biology.

[8]  S. D. De Keersmaecker,et al.  How to Deal with the Upcoming Challenges in GMO Detection in Food and Feed , 2012, Journal of biomedicine & biotechnology.

[9]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[10]  Ramon Goni,et al.  The qPCR data statistical analysis , 2009 .

[11]  R. Escalante,et al.  Structural and functional studies of a family of Dictyostelium discoideum developmentally regulated, prestalk genes coding for small proteins , 2008, BMC Microbiology.

[12]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[13]  Olivier Fardel,et al.  Differential Regulation of Sinusoidal and Canalicular Hepatic Drug Transporter Expression by Xenobiotics Activating Drug-Sensing Receptors in Primary Human Hepatocytes , 2006, Drug Metabolism and Disposition.

[14]  A. Palva,et al.  Development of an extensive set of 16S rDNA‐targeted primers for quantification of pathogenic and indigenous bacteria in faecal samples by real‐time PCR , 2004, Journal of applied microbiology.

[15]  A. Whatmore,et al.  Rapid identification of Brucella isolates to the species level by real time PCR based single nucleotide polymorphism (SNP) analysis , 2008, BMC Microbiology.

[16]  M. Pfaffl Development and Validation of an Externally Standardised Quantitative Insulin-like Growth Factor-1 RT-PCR Using LightCycler SYBR Green I Technology , 2001 .

[17]  케리 뱅크스물리스,et al.  Process for amplifying detecting and/or cloning nucleic acid sequence , 1986 .

[18]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[19]  Chris F. Taylor,et al.  RDML: structured language and reporting guidelines for real-time quantitative PCR data , 2009, Nucleic acids research.

[20]  Grace Jordison Molecular Biology of the Gene , 1965, The Yale Journal of Biology and Medicine.

[21]  Carl T Wittwer,et al.  Sensitivity and specificity of single-nucleotide polymorphism scanning by high-resolution melting analysis. , 2004, Clinical chemistry.

[22]  J. Matthijnssens,et al.  Rapid detection and high occurrence of porcine rotavirus A, B, and C by RT-qPCR in diagnostic samples. , 2014, Journal of virological methods.

[23]  M. Pfaffl,et al.  A new mathematical model for relative quantification in real-time RT-PCR. , 2001, Nucleic acids research.

[24]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[25]  H. Nakauchi,et al.  Development of Defective and Persistent Sendai Virus Vector , 2010, The Journal of Biological Chemistry.

[26]  Patrick Royston,et al.  The design of simulation studies in medical statistics , 2006, Statistics in medicine.

[27]  Lin Tang,et al.  Noninvasive detection of fetal trisomy 21 by sequencing of DNA in maternal blood: a study in a clinical setting. , 2011, American journal of obstetrics and gynecology.

[28]  K. Rabe,et al.  Rapid KRAS, EGFR, BRAF and PIK3CA Mutation Analysis of Fine Needle Aspirates from Non-Small-Cell Lung Cancer Using Allele-Specific qPCR , 2011, PloS one.

[29]  Matthew N. McCall,et al.  On non-detects in qPCR data , 2014, Bioinform..

[30]  S. E. Barker,et al.  Effective gene therapy with nonintegrating lentiviral vectors , 2006, Nature Medicine.

[31]  Tony Tran,et al.  Mutational Analysis of Circulating Tumor Cells Using a Novel Microfluidic Collection Device and qPCR Assay. , 2013, Translational oncology.

[32]  A. Ibekwe,et al.  Multiplex Fluorogenic Real-Time PCR for Detection and Quantification of Escherichia coli O157:H7 in Dairy Wastewater Wetlands , 2002, Applied and Environmental Microbiology.