Bootstrapping the gini coefficient of inequality

Despite current interest in the causes and consequences of plant size hierarchies, there are different opinions about the best way to evaluate a size distribution. Weiner and Solbrig (1984) have argued that size hierarchy means size inequality, and that the Gini coefficient of inequality (Sen 1973) is more relevant than skewness or variance of plant size (e.g., Turner and Rabinowitz 1983) for most ecological questions. Weiner (1985) presents a formula to estimate the population Gini coefficient (G) from a sample and states that reasonable confidence intervals for the population Gini coefficient can be obtained by a bootstrapping procedure (Efron 1982). This note evaluates the accuracy of these bootstrap confidence intervals. We find that they are reasonably accurate when calculated from samples of 50 or more individuals, but that they are too narrow when calculated from smaller samples. The bootstrap procedure uses the observed data to estimate the theoretical and usually unknown distribution from which the data came (Efron 1982, Meyer et al. 1986). Bootstrap samples of the same size as the original sample are repeatedly drawn by sampling with replacement from the observed data. The test statistic, e.g., the Gini coefficient, is calculated for each bootstrap sample. The distribution of G's obtained from bootstrap sampling can be used to estimate the standard deviation and set confidence limits on the observed statistic (Efron 1982). The bootstrap procedure does not require any knowledge of the distribution of the statistic in question, may have certain optimal properties (Efron 198 1, but see Schenker 1985, Wu 1986), and can be used when the standard deviation or confidence intervals for the statistic are unknown or difficult to calculate analytically. The accuracy of any method for computing confidence intervals can be evaluated by generating data from a known distribution with a known parameter. If many samples of data from the known distribution are generated and a confidence interval calculated from each, the number of confidence intervals that include the parameter can be determined. An accurate confidence interval includes the known parameter the stated percentage of the time; for example, a 95% confidence interval should include the true value in 95% of the random samples of data. Although in some situations bootstrap confidence intervals are relatively accurate (Efron 1982:79), in other situations they are too narrow (Schenker 1985, Meyer et al. 1986).