The conventional approach to evaluating the joint statistical significance of multiple hypothesis tests (i.e., “field,” or “global,” significance) in meteorology and climatology is to count the number of individual (or “local”) tests yielding nominally significant results and then to judge the unusualness of this integer value in the context of the distribution of such counts that would occur if all local null hypotheses were true. The sensitivity (i.e., statistical power) of this approach is potentially compromised both by the discrete nature of the test statistic and by the fact that the approach ignores the confidence with which locally significant tests reject their null hypotheses. An alternative global test statistic that has neither of these problems is the minimum p value among all of the local tests. Evaluation of field significance using the minimum local p value as the global test statistic, which is also known as the Walker test, has strong connections to the joint evaluation of multiple tests in a way that controls the “false discovery rate” (FDR, or the expected fraction of local null hypothesis rejections that are incorrect). In particular, using the minimum local p value to evaluate field significance at a level global is nearly equivalent to the slightly more powerful global test based on the FDR criterion. An additional advantage shared by Walker’s test and the FDR approach is that both are robust to spatial dependence within the field of tests. The FDR method not only provides a more broadly applicable and generally more powerful field significance test than the conventional counting procedure but also allows better identification of locations with significant differences, because fewer than global 100% (on average) of apparently significant local tests will have resulted from local null hypotheses that are true.
[1]
Hans von Storch,et al.
A Remark on Chervin-Schneider's Algorithm to Test Significance of Climate Experiments with GCM's
,
1982
.
[2]
Y. Benjamini,et al.
Controlling the false discovery rate: a practical and powerful approach to multiple testing
,
1995
.
[3]
Barbara G. Brown,et al.
The problem of multiplicity in research on teleconnections
,
2007
.
[4]
Clive Anderson,et al.
Estimating Changing Extremes Using Empirical Ranking Methods
,
2002
.
[5]
H. J. Thiébaux,et al.
Statistical Considerations for Climate Experiments. Part I: Scalar Tests
,
1987
.
[6]
F. Zwiers,et al.
Statistical Considerations for Climate Experiments. Part II: Multivariate Tests
,
1987
.
[7]
R. E. Livezey,et al.
Statistical Field Significance and its Determination by Monte Carlo Techniques
,
1983
.
[8]
Valérie Ventura,et al.
Controlling the Proportion of Falsely Rejected Hypotheses when Conducting Multiple Tests with Climatological Data
,
2004
.
[9]
H. Hartley,et al.
Tests of significance in harmonic analysis.
,
1949,
Biometrika.
[10]
W. Briggs.
Statistical Methods in the Atmospheric Sciences
,
2007
.
[11]
Richard W. Katz,et al.
Sir Gilbert Walker and a Connection between El Niño and Statistics
,
2002
.
[12]
Daniel S. Wilks,et al.
Resampling Hypothesis Tests for Autocorrelated Fields
,
1997
.
[13]
Richard A. Johnson,et al.
Applied Multivariate Statistical Analysis
,
1983
.
[14]
E. Gumbel,et al.
Statistics of extremes
,
1960
.