Contaminated Chi-Square Modeling and Large-Scale ANOVA Testing

We propose a convenient moment-based procedure for testing the omnibus null hypothesis of no contamination of a central chi-square distribution by a non-central chi-square distribution. In sharp contrast with likelihood ratio tests for mixture models, there is no need for re-sampling or random field theory to obtain critical values. Rather, critical values are available from an asymptotic normal distribution, and there is excellent agreement between nominal and actual significance levels. This procedure may be used to model numerous chi-square statistics, obtained via monotonic transformations of F statistics, from large-scale ANOVA testing, such as that encountered in microarray data analysis. In that context, modeling chi-square statistics instead of p-values may improve detection of differential gene expression, as we demonstrate through simulation studies, while also reducing false declarations of the same, as we illustrate in a case study on aging and cognition. Our procedure may also be incorporated into a gene filtration process, which may reduce type II errors on genewise null hypotheses by justifying lighter controls for Type I errors.

[1]  Richard Charnigo,et al.  Omnibus testing and gene filtration in microarray data analysis , 2008 .

[2]  J. Kalbfleisch,et al.  A modified likelihood ratio test for homogeneity in finite mixture models , 2001 .

[3]  Martin S. Taylor,et al.  Pervasive haplotypic variation in the spliceo-transcriptome of the human major histocompatibility complex. , 2011, Genome research.

[4]  Jiahua Chen,et al.  The likelihood ratio test for homogeneity in finite mixture models , 2001 .

[5]  Jiahua Chen,et al.  Hypothesis test for normal mixture models: The EM approach , 2009, 0908.3428.

[6]  Jiayang Sun Tail probabilities of the maxima of Gaussian random fields , 1993 .

[7]  Richard Charnigo,et al.  Testing unilateral versus bilateral normal contamination , 2013 .

[8]  Ingo Ruczinski,et al.  Primary and secondary transcriptional effects in the developing human Down syndrome brain and heart , 2005, Genome Biology.

[9]  A. Helmi,et al.  FASHIONABLY LATE? BUILDING UP THE MILKY WAY'S INNER HALO , 2008, 0804.2448.

[10]  K. Roeder A Graphical Technique for Determining the Number of Components in a Mixture of Normals , 1994 .

[11]  T. Foster,et al.  Gene Microarrays in Hippocampal Aging: Statistical Profiling Identifies Novel Processes Correlated with Cognitive Impairment , 2003, The Journal of Neuroscience.

[12]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[13]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[14]  Wei Deng,et al.  Characterizing Components in a Mixture Model for BirthweightDistribution , 2011 .

[15]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[16]  Pengfei Li,et al.  Non-finite Fisher information and homogeneity: an EM approach , 2009 .

[17]  E. Gassiat,et al.  Testing the order of a model using locally conic parametrization : population mixtures and stationary ARMA processes , 1999 .

[18]  Jiayang Sun,et al.  Testing Homogeneity in a Mixture Distribution via the L2 Distance Between Competing Models , 2004 .

[19]  Robert L. Wolpert,et al.  Statistical Inference , 2019, Encyclopedia of Social Network Analysis and Mining.

[20]  D. Cavalieri,et al.  Fundamentals of cDNA microarray data analysis. , 2003, Trends in genetics : TIG.

[21]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[22]  Werner Dubitzky,et al.  A Practical Approach to Microarray Data Analysis , 2003, Springer US.

[23]  Antoine Chambaz,et al.  Testing the order of a model , 2006 .

[24]  P. Deb Finite Mixture Models , 2008 .

[25]  Y. Bechtel,et al.  A population and family study N‐acetyltransferase using caffeine urinary metabolites , 1993, Clinical pharmacology and therapeutics.

[26]  Hongtu Zhu,et al.  Hypothesis testing in mixture regression models , 2004 .

[27]  Richard Charnigo,et al.  Contaminated normal modeling with application to microarray data analysis , 2010 .

[28]  Y. Shao,et al.  Asymptotics for likelihood ratio tests under loss of identifiability , 2003 .

[29]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[30]  Brooke L. Fridley,et al.  Genetic Association Studies of Copy-Number Variation: Should Assignment of Copy Number States Precede Testing? , 2012, PloS one.

[31]  P. Sen,et al.  On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results , 1984 .

[32]  K. Roeder Density estimation with confidence sets exemplified by superclusters and voids in the galaxies , 1990 .

[33]  Jiayang Sun,et al.  ASYMPTOTIC RELATIONSHIPS BETWEEN THE D-TEST AND LIKELIHOOD RATIO-TYPE TESTS FOR HOMOGENEITY , 2010 .

[34]  J. Hartigan A failure of likelihood asymptotics for normal mixtures , 1985 .

[35]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[36]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[37]  Jiayang Sun,et al.  Testing homogeneity in discrete mixtures , 2008 .

[38]  A. T. Sumner Chromosomes: Organization and Function , 2003 .

[39]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..