Two-sample test for sparse high-dimensional multinomial distributions

In this paper we consider testing the equality of probability vectors of two independent multinomial distributions in high dimension. The classical Chi-square test may have some drawbacks in this case since many of cell counts may be zero or may not be large enough. We propose a new test and show its asymptotic normality and the asymptotic power function. Based on the asymptotic power function, we present an application of our result to a neighborhood-type test which has been previously studied, especially for the case of fairly small p values. To compare the proposed test with existing tests, we provide numerical studies including simulations and real data examples.

[1]  J. Berger,et al.  Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .

[2]  Donald A. Anderson,et al.  Tests on categorical data from the unionintersection principle , 1974 .

[3]  Junyong Park,et al.  Plug‐in tests for nonequivalence of means of independent normal populations , 2014, Biometrical journal. Biometrische Zeitschrift.

[4]  D. Zelterman Goodness-of-Fit Tests for Large Sparse Multinomial Distributions , 1987 .

[5]  Axel Munk,et al.  The one- and multi-sample problem for functional data with application to projective shape analysis , 2008 .

[6]  Carl N. Morris,et al.  CENTRAL LIMIT THEOREMS FOR MULTINOMIAL SUMS , 1975 .

[7]  Holger Dette,et al.  Validation of linear regression models , 1998 .

[8]  P. Hall,et al.  Innovated Higher Criticism for Detecting Sparse Signals in Correlated Noise , 2009, 0902.3837.

[9]  Junyong Park,et al.  A test for the mean vector in large dimension and small samples , 2013 .

[10]  J. B. S. Haldane,et al.  THE MEAN AND VARIANCE OF χ2, WHEN USED AS A TEST OF HOMOGENEITY, WHEN EXPECTATIONS ARE SMALL , 1940 .

[11]  Song-xi Chen,et al.  A two-sample test for high-dimensional data with applications to gene-set testing , 2010, 1002.4547.

[12]  Timothy R. C. Read,et al.  Multinomial goodness-of-fit tests , 1984 .

[13]  Likelihood Ratio Tests for Interval Hypotheses with Applications , 2015 .

[14]  Weidong Liu,et al.  Two‐sample test of high dimensional means under dependence , 2014 .

[15]  J. Berger,et al.  Testing Precise Hypotheses , 1987 .

[16]  M. Srivastava,et al.  A test for the mean vector with fewer observations than the dimension , 2008 .

[17]  Ewcomer Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models , 2008 .

[18]  Muni S. Srivastava,et al.  A test for the mean vector with fewer observations than the dimension under non-normality , 2009, J. Multivar. Anal..

[19]  G. P. Steck,et al.  Limit theorems for conditional distributions , 1957 .

[20]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[21]  Sung-Ho Kim,et al.  Estimate-based goodness-of-fit test for large sparse multinomial distributions , 2009, Comput. Stat. Data Anal..

[22]  Muni S. Srivastava,et al.  A two sample test in high dimensional data , 2013, Journal of Multivariate Analysis.

[23]  J. L. Hodges,et al.  Testing the Approximate Validity of Statistical Hypotheses , 1954 .

[24]  Z. Bai,et al.  EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM , 1999 .