Omnibus testing and gene filtration in microarray data analysis

Abstract When thousands of tests are performed simultaneously to detect differentially expressed genes in microarray analysis, the number of Type I errors can be immense if a multiplicity adjustment is not made. However, due to the large scale, traditional adjustment methods require very stringen significance levels for individual tests, which yield low power for detecting alterations. In this work, we describe how two omnibus tests can be used in conjunction with a gene filtration process to circumvent difficulties due to the large scale of testing. These two omnibus tests, the D-test and the modified likelihood ratio test (MLRT), can be used to investigate whether a collection of P-values has arisen from the Uniform(0,1) distribution or whether the Uniform(0,1) distribution contaminated by another Beta distribution is more appropriate. In the former case, attention can be directed to a smaller part of the genome; in the latter event, parameter estimates for the contamination model provide a frame of reference for multiple comparisons. Unlike the likelihood ratio test (LRT), both the D-test and MLRT enjoy simple limiting distributions under the null hypothesis of no contamination, so critical values can be obtained from standard tables. Simulation studies demonstrate that the D-test and MLRT are superior to the AIC, BIC, and Kolmogorov–Smirnov test. A case study illustrates omnibus testing and filtration.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Jiahua Chen,et al.  The likelihood ratio test for homogeneity in finite mixture models , 2001 .

[3]  Mohamed Lemdani,et al.  Likelihood ratio tests in contamination models , 1999 .

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  F. James Statistical Methods in Experimental Physics , 1973 .

[6]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[7]  J. Kalbfleisch,et al.  A modified likelihood ratio test for homogeneity in finite mixture models , 2001 .

[8]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[9]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[10]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[11]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[12]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[13]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[14]  Hongtu Zhu,et al.  Hypothesis testing in mixture regression models , 2004 .

[15]  E. Gassiat,et al.  Testing the order of a model using locally conic parametrization : population mixtures and stationary ARMA processes , 1999 .

[16]  J. Hartigan A failure of likelihood asymptotics for normal mixtures , 1985 .

[17]  Jiayang Sun,et al.  Testing Homogeneity in a Mixture Distribution via the L2 Distance Between Competing Models , 2004 .

[18]  George Stephanopoulos,et al.  Determination of minimum sample size and discriminatory expression patterns in microarray data , 2002, Bioinform..

[19]  John D. Kalbfleisch,et al.  Modified likelihood ratio test in finite mixture models with a structural parameter , 2005 .

[20]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[21]  W. Pan,et al.  How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach , 2002, Genome Biology.

[22]  Rafe M. J. Donahue,et al.  A Note on Information Seldom Reported via the P Value , 1999 .

[23]  B. P. Murphy,et al.  Handbook of Methods of Applied Statistics , 1968 .

[24]  M. Stephens EDF Statistics for Goodness of Fit and Some Comparisons , 1974 .

[25]  G A Whitmore,et al.  Power and sample size for DNA microarray studies , 2002, Statistics in medicine.

[26]  T. Foster,et al.  Gene Microarrays in Hippocampal Aging: Statistical Profiling Identifies Novel Processes Correlated with Cognitive Impairment , 2003, The Journal of Neuroscience.