Controlling false-negative errors in microarray differential expression analysis: a PRIM approach.

MOTIVATION Theoretical considerations suggest that current microarray screening algorithms may fail to detect many true differences in gene expression (Type II analytic errors). We assessed 'false negative' error rates in differential expression analyses by conventional linear statistical models (e.g. t-test), microarray-adapted variants (e.g. SAM, Cyber-T), and a novel strategy based on hold-out cross-validation. The latter approach employs the machine-learning algorithm Patient Rule Induction Method (PRIM) to infer minimum thresholds for reliable change in gene expression from Boolean conjunctions of fold-induction and raw fluorescence measurements. RESULTS Monte Carlo analyses based on four empirical data sets show that conventional statistical models and their microarray-adapted variants overlook more than 50% of genes showing significant up-regulation. Conjoint PRIM prediction rules recover approximately twice as many differentially expressed transcripts while maintaining strong control over false-positive (Type I) errors. As a result, experimental replication rates increase and total analytic error rates decline. RT-PCR studies confirm that gene inductions detected by PRIM but overlooked by other methods represent true changes in mRNA levels. PRIM-based conjoint inference rules thus represent an improved strategy for high-sensitivity screening of DNA microarrays. AVAILABILITY Freestanding JAVA application at http://microarray.crump.ucla.edu/focus

[1]  Frank Harary,et al.  Graph Theory , 2016 .

[2]  A. Tversky,et al.  Representations of qualitative and quantitative dimensions. , 1982, Journal of experimental psychology. Human perception and performance.

[3]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  A D Long,et al.  Improved Statistical Inference from DNA Microarray Data Using Analysis of Variance and A Bayesian Statistical Framework , 2001, The Journal of Biological Chemistry.

[5]  W H Wong,et al.  Genome-wide expression analysis reveals dysregulation of myelination-related genes in chronic schizophrenia , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  G. W. Snedecor Statistical Methods , 1964 .

[7]  A. Tversky Features of Similarity , 1977 .

[8]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[10]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Elizabeth C. Hirschman,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[13]  Paul Bratley,et al.  A guide to simulation , 1983 .

[14]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[15]  K. Miura,et al.  Quantitative assessment of DNA microarrays--comparison with Northern blot analyses. , 2001, Genomics.

[16]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[17]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Rupert G. Miller Beyond ANOVA, basics of applied statistics , 1987 .

[19]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[20]  S. Sealfon,et al.  Accuracy and calibration of commercial oligonucleotide and custom cDNA microarrays. , 2002, Nucleic acids research.

[21]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[22]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[23]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[24]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[25]  B. J. Winer Statistical Principles in Experimental Design , 1992 .

[26]  David Baltimore,et al.  Cooperation of multiple signaling pathways in CD40-regulated gene expression in B lymphocytes , 2002, Proceedings of the National Academy of Sciences of the United States of America.