Estimating correlation with multiply censored data arising from the adjustment of singly censored data.

Environmental data frequently are left censored due to detection limits of laboratory assay procedures. Left censored means that some of the observations are known only to fall below a censoring point (detection limit). This presents difficulties in statistical analysis of the data. In this paper, we examine methods for estimating the correlation between variables each of which is censored at multiple points. Multiple censoring frequently arises due to adjustment of singly censored laboratory results for physical sample size. We discuss maximum likelihood (ML) estimation of the correlation and introduce a new method (cp.mle2) that, instead of using the multiply censored data directly, relies on ML estimates of the covariance of the singly censored laboratory data. We compare the ML methods with Kendall's tau-b (ck.taub) which is a modification Kendall's tau adjusted for ties, and several commonly used simple substitution methods: correlations estimated with nondetects set to the detection limit divided by 2 and correlations based on detects only (cs.det) with nondetects setto missing. The methods are compared based on simulations and real data. In the simulations, censoring levels are varied from 0 to 90%, p from -0.8 to 0.8, and v (variance of physical sample size) is set to 0 and 0.5, for a total of 550 parameter combinations with 1000 replications at each combination. We find that with increasing levels of censoring most of the correlation methods are highly biased. The simple substitution methods in general tend toward zero if singly censored and one if multiply censored. ck.taub tends toward zero. Least biased is cp.mle2, however, it has higher variance than some of the other estimators. Overall, cs.det performs the worst and cp.mle2 the best.