Testing the equality of distributions of random vectors with categorical components

We develop a method for testing the equality of two or more distributions of random vectors with categorical components. We define a function that gives a distance between any two data vectors. Each observed data vector is linked with its nearest neighbor(s). The test statistic is the number of edges linking observations from different distributions. Inference is conditional on the number of observations from each distribution and the number of times each of the data vectors is observed in the pooled sample. Permutation testing and asymptotics are used to estimate the observed significance level.

[1]  J. Friedman,et al.  Graph-Theoretic Measures of Multivariate Association and Prediction , 1983 .

[2]  W. G. Cochran Some Methods for Strengthening the Common χ 2 Tests , 1954 .

[3]  H. E. Daniels,et al.  The Relation Between Measures of Correlation in the Universe of Sample Permutations , 1944 .

[4]  K. Koehler Goodness-of-fit tests for log-linear models in sparse contingency tables , 1986 .

[5]  J A Anderson,et al.  A statistical aid to the diagnosis of keratoconjunctivitis sicca. , 1972, The Quarterly journal of medicine.

[6]  W. G. Cochran The $\chi^2$ Test of Goodness of Fit , 1952 .

[7]  D. Zelterman Goodness-of-Fit Tests for Large Sparse Multinomial Distributions , 1987 .

[8]  R. D'Agostino,et al.  Goodness-of-Fit-Techniques , 1987 .

[9]  D. Darling,et al.  A Test of Goodness of Fit , 1954 .

[10]  A Agresti,et al.  Modeling a Categorical Variable Allowing Arbitrarily Many Category Choices , 1999, Biometrics.

[11]  Dan Nettleton,et al.  Multiple Marginal Independence Testing for Pick Any/C Variables , 2000 .

[12]  P. Diaconis Group representations in probability and statistics , 1988 .

[13]  Thomas M. Loughin,et al.  Testing for Association in Contingency Tables with Multiple Column Responses , 1998 .

[14]  D. Critchlow Metric Methods for Analyzing Partially Ranked Data , 1986 .

[15]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[16]  Timothy R. C. Read,et al.  Multinomial goodness-of-fit tests , 1984 .