Finite mixture models with concomitant information: assessing diagnostic criteria for diabetes

The World Health Organization (WHO) diagnostic criteria for diabetes mellitus were determined in part by evidence that in some populations the plasma glucose level 2 h after an oral glucose load is a mixture of two distinct distributions. We present a finite mixture model that allows the two component densities to be generalized linear models and the mixture probability to be a logistic regression model. The model allows us to estimate the prevalence of diabetes and sensitivity and specificity of the diagnostic criteria as a function of covariates and to estimate them in the absence of an external standard. Sensitivity is the probability that a test indicates disease conditionally on disease being present. Specificity is the probability that a test indicates no disease conditionally on no disease being present. We obtained maximum likelihood estimates via the EM algorithm and derived the standard errors from the information matrix and by the bootstrap. In the application to data from the diabetes in Egypt project a two‐component mixture model fits well and the two components are interpreted as normal and diabetes. The means and variances are similar to results found in other populations. The minimum misclassification cutpoints decrease with age, are lower in urban areas and are higher in rural areas than the 200 mg dl‐1 cutpoint recommended by the WHO. These differences are modest and our results generally support the WHO criterion. Our methods allow the direct inclusion of concomitant data whereas past analyses were based on partitioning the data.

[1]  R. Jansen,et al.  A statistical mixture model for estimating the proportion of unreduced pollen grains in perennial ryegrass (Lolium perenne L.) via the size of pollen grains , 2004, Euphytica.

[2]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[3]  William H. Herman,et al.  Diabetes mellitus in Egypt: risk factors, prevalence and future burden , 1997, Eastern Mediterranean Health Journal.

[4]  M. Engelgau,et al.  The Onset of NIDDM and its Relationship to Clinical Diagnosis in Egyptian Adults , 1996, Diabetic medicine : a journal of the British Diabetic Association.

[5]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[6]  Amani S. Ibrahim,et al.  Diabetes Mellitus in Egypt: Risk Factors and Prevalence , 1995, Diabetic medicine : a journal of the British Diabetic Association.

[7]  M. Engelgau,et al.  Screening for Diabetes Mellitus in Adults: The utility of random capillary blood glucose measurements , 1995, Diabetes Care.

[8]  A. Motala,et al.  South African Indians Show a High Prevalence of NIDDM and Bimodality in Plasma Glucose Distribution Patterns , 1994, Diabetes Care.

[9]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[10]  Ritsert C. Jansen,et al.  Maximum Likelihood in a Generalized Linear Finite Mixture Model by Using the EM Algorithm , 1993 .

[11]  Kenneth G. Manton,et al.  “Equivalent Sample Size” and “Equivalent Degrees of Freedom” Refinements for Inference Using Survey Weights under Superpopulation Models , 1992 .

[12]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[13]  John Hinde,et al.  Statistical Modelling in GLIM. , 1989 .

[14]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[15]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[16]  S. Haffner,et al.  Evidence of bimodality of two hour plasma glucose concentrations in Mexican Americans: results from the San Antonio Heart study. , 1985, Journal of chronic diseases.

[17]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[18]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[19]  B. Everitt,et al.  Finite Mixture Distributions , 1981 .

[20]  M. Aitkin,et al.  Mixture Models, Outliers, and the EM Algorithm , 1980 .

[21]  B. Efron Computers and the Theory of Statistics: Thinking the Unthinkable , 1979 .

[22]  P. Zimmet Bimodality of Fasting and Two-hour Glucose Tolerance Distributions in a Micronesian Population , 1978, Diabetes.

[23]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[24]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[25]  E. Keeler,et al.  Primer on certain elements of medical decision making. , 1975, The New England journal of medicine.

[26]  B. Leijnse,et al.  [Normal values in clinical chemistry]. , 1972, Pharmaceutisch Weekblad.

[27]  P. Bennett,et al.  Diabetes in the Pima Indians: Evidence of Bimodality in Glucose Tolerance Distributions , 1971, Diabetes.

[28]  G. Box An analysis of transformations (with discussion) , 1964 .

[29]  Walter L. Smith Probability and Statistics , 1959, Nature.