Asymptotically distribution free tests in heteroscedastic unbalanced high dimensional ANOVA

In this paper, we develop the asymptotic theory for hypotheses testing in high-dimensional analysis of variance (HANOVA) when the distributions are completely unspecified. Most results in the literature have been restricted to obser- vations of no more than two-way designs for continuous data. Here we formulate the local alternatives in terms of departures from the null distribution so that the re- sponses can be either continuous or categorical. The asymptotic theory is presented for testing of main factor and interaction effects of up to order three in unbalanced designs with heteroscedastic variances and arbitrary number of factors. The test statistics are based on quadratic forms whose asymptotic theory is derived under non-classical settings where the number of variables is large while the number of replications may be limited. Simulation results show that the present test statistics perform well in both continuous and discrete HANOVA in type I error accuracy, power performance, and computing time. The proposed test is illustrated with a gene expression data analysis of Arabidopsis thaiana in response to multiple abiotic stresses.

[1]  Solomon W. Harrar,et al.  Nonparametric methods in multivariate factorial designs for large number of factor levels , 2008 .

[2]  M. Akritas,et al.  TWO-WAY HETEROSCEDASTIC ANOVA WHEN THE NUMBER OF LEVELS IS LARGE , 2006 .

[3]  Bruce G. Lindsay,et al.  Efficiency of projected score methods in rectangular array asymptotics , 2003 .

[4]  J. Neyman,et al.  Consistent Estimates Based on Partially Consistent Observations , 1948 .

[5]  W. J. Hall,et al.  Asymptotically uniformly most powerful tests in parametric and semiparametric models , 1996 .

[6]  Arne C. Bathke,et al.  The ANOVA F test can still be used in some balanced designs with unequal variances and nonnormal data , 2004 .

[7]  Marcello Pagano,et al.  Using temporal context to improve biosurveillance , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  S. Altan,et al.  The analysis of small-sample multivariate data. , 1998, Journal of Biopharmaceutical Statistics.

[9]  Haiyan Wang,et al.  Testing in multifactor heteroscedastic ANOVA and repeated measures designs with large number of levels , 2004 .

[10]  Shelby J. Haberman,et al.  Log-Linear Models and Frequency Tables with Small Expected Cell Counts , 1977 .

[11]  H. Sahai,et al.  The Analysis of Variance: Fixed, Random and Mixed Models , 2000 .

[12]  Dennis D. Boos,et al.  ANOVA and rank tests when the number of treatments is large , 1995 .

[13]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[14]  Daniel J. Mollura,et al.  Electronic Medical Record (EMR) Utilization for Public Health Surveillance , 2008, AMIA.

[15]  Frank Yates,et al.  The Analysis of Multiple Classifications with Unequal Numbers in the Different Classes , 1934 .

[16]  Haiyan Wang,et al.  Rank tests for anova with large number of factor levels , 2004 .

[17]  Kiana Toufighi,et al.  The Botany Array Resource: E-northerns, Expression Angling, and Promoter Analyses , 2022 .

[18]  Jianqing Fan,et al.  Test of Significance When Data Are Curves , 1998 .

[19]  M. Akritas,et al.  Heteroscedastic One-Way ANOVA and Lack-of-Fit Tests , 2004 .

[20]  Holger Dette,et al.  Box-Type Approximations in Nonparametric Factorial Designs , 1997 .

[21]  Michael G. Akritas,et al.  Asymptotics for Analysis of Variance When the Number of Levels is Large , 2000 .

[22]  Intermediate efficiency of some max-type statistics , 2006 .

[23]  Jianqing Fan Test of Significance Based on Wavelet Thresholding and Neyman's Truncation , 1996 .

[24]  Edgar Brunner,et al.  Nonparametric methods in factorial designs , 2001 .

[25]  D. Buckeridge,et al.  Systematic Review: Surveillance Systems for Early Detection of Bioterrorism-Related Diseases , 2004, Annals of Internal Medicine.

[26]  S. Portnoy Asymptotic Behavior of Likelihood Methods for Exponential Families when the Number of Parameters Tends to Infinity , 1988 .