Rank-Sum Tests for Clustered Data

The Wilcoxon rank-sum test is widely used to test the equality of two populations, because it makes fewer distributional assumptions than parametric procedures such as the t-test. However, the Wilcoxon rank-sum test can be used only if data are independent. When data are clustered, tests based on generalized estimating equations (GEEs) that generalize the t-test have been proposed. Here we develop a rank-sum test that can be used when data are clustered. As an application, we use our rank-sum test to develop a nonparametric test of association between a genetic marker and a quantitative trait locus. We also give a rank-sum test for equivalence of three or more populations that generalizes the Kruskal–Wallis test to situations with clustered data. Unlike previous rank tests for clustered data, our proposal is valid when members of the same cluster belong to different groups, or when the correlation between cluster members differs across groups.