Standard error and sample size determination for estimation of probabilities based on a test variable.

A method of sample size determination for estimation of probabilities based on a test variable is presented. Applications to estimation of sensitivity and specificity of medical tests are the focus of this research, although the methods can be applied to other areas of study such as engineering reliability. Examples are given for determining sample sizes required for the classification of patients with cutaneous lupus erythematosus based on the incidence of several markers. In this example, the test variable is the number of markers present. The methodology employs a weighted average of model-based and non-model-based estimates of the probability with the weights determined by the closeness to or the confidence in the given model. Formulas and charts required for determining sample size are provided for test variables that can be modeled by the binomial, Poisson, or normal distributions, i.e., for the most commonly encountered distributions for counting events (binomial and Poisson) and for measurements (normal). However, the methods given can be applied to any distribution, including multivariate. Especially when relatively small probabilities (the rare events) are being estimated, the techniques provided assistance in safeguarding against undersampling brought on by unwarranted confidence in a test variable distribution and against oversampling required for high accuracy in non-model-based probability estimators.

[1]  Richard E. Barlow,et al.  Statistical Theory of Reliability and Life Testing: Probability Models , 1976 .

[2]  Gerald J. Glasser,et al.  Minimum Variance Unbiased Estimators for Poisson Probabilities , 1962 .

[3]  B. Davis,et al.  PRELIMINARY, DERMATOLOGIC EIRST STEP CRITERIA FOR LUPUS ERYTHEMATOSUS AND SECOND STEP CRITERIA FOR SYSTEMIC LUPUS ERYTHEMATOSUS , 1993, International journal of dermatology.

[4]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[5]  N. L. Johnson,et al.  Distributions in Statistics: Discrete Distributions. , 1970 .

[6]  G. J. Hahn,et al.  Asymptotically Optimum Over-Stress Tests to Estimate the Survival Probability at a Condition with a Low Expected Failure Probability , 1977 .

[7]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[8]  W. G. Cochran The $\chi^2$ Test of Goodness of Fit , 1952 .

[9]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[10]  Joseph L. Gastwirth,et al.  The Statistical Precision of Medical Screening Procedures: Application to Polygraph and AIDS Antibodies Test Data , 1987 .

[11]  Jerald F. Lawless,et al.  Statistical Models and Methods for Lifetime Data. , 1983 .

[12]  David R. Cox The analysis of binary data , 1970 .

[13]  B. Davis,et al.  Dermatologic criteria for classifying the major forms of cutaneous lupus erythematosus: methods for systematic discriminant analysis and questions on the interpretation of findings. , 1992, Clinics in dermatology.

[14]  P. R. Fisk,et al.  Distributions in Statistics: Continuous Multivariate Distributions , 1971 .