A fast Monte Carlo expectation–maximization algorithm for estimation in latent class model analysis with an application to assess diagnostic accuracy for cervical neoplasia in women with atypical glandular cells

In this article, we use a latent class model (LCM) with prevalence modeled as a function of covariates to assess diagnostic test accuracy in situations where the true disease status is not observed, but observations on three or more conditionally independent diagnostic tests are available. A fast Monte Carlo expectation–maximization (MCEM) algorithm with binary (disease) diagnostic data is implemented to estimate parameters of interest; namely, sensitivity, specificity, and prevalence of the disease as a function of covariates. To obtain standard errors for confidence interval construction of estimated parameters, the missing information principle is applied to adjust information matrix estimates. We compare the adjusted information matrix-based standard error estimates with the bootstrap standard error estimates both obtained using the fast MCEM algorithm through an extensive Monte Carlo study. Simulation demonstrates that the adjusted information matrix approach estimates the standard error similarly with the bootstrap methods under certain scenarios. The bootstrap percentile intervals have satisfactory coverage probabilities. We then apply the LCM analysis to a real data set of 122 subjects from a Gynecologic Oncology Group study of significant cervical lesion diagnosis in women with atypical glandular cells of undetermined significance to compare the diagnostic accuracy of a histology-based evaluation, a carbonic anhydrase-IX biomarker-based test and a human papillomavirus DNA test.

[1]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[2]  S D Walter,et al.  Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review. , 1988, Journal of clinical epidemiology.

[3]  I Yang,et al.  Latent variable modeling of diagnostic accuracy. , 1997, Biometrics.

[4]  Lilian M. de Menezes On fitting latent class models for binary data: The estimation of standard errors , 1999 .

[5]  Huiping Xu,et al.  A Probit Latent Class Model with General Correlation Structures for Evaluating Accuracy of Diagnostic Tests , 2009, Biometrics.

[6]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[7]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[8]  D. Oakes Direct calculation of the information matrix via the EM , 1999 .

[9]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[10]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[11]  George Casella,et al.  Implementations of the Monte Carlo EM Algorithm , 2001 .

[12]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised Edition) , 1999 .

[13]  R. E. Wheeler Statistical distributions , 1983, APLQ.

[14]  Larry Wasserman,et al.  All of Statistics , 2004 .

[15]  D M Hawkins,et al.  Some issues in resolution of diagnostic tests using an imperfect gold standard , 2001, Statistics in medicine.

[16]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[17]  J. Booth,et al.  Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm , 1999 .

[18]  Margaret Sullivan Pepe,et al.  Insights into latent class analysis of diagnostic test performance. , 2007, Biostatistics.

[19]  F. Krauss Latent Structure Analysis , 1980 .

[20]  A. Formann Linear Logistic Latent Class Analysis for Polytomous Data , 1992 .

[21]  L. Tuason,et al.  Results of the clinical evaluation of atypical glandular cells of undetermined significance (AGCUS) detected on cervical cytology screening. , 1996, Gynecologic oncology.

[22]  H Stryhn,et al.  Conditional dependence between tests affects the diagnosis and surveillance of animal diseases. , 2000, Preventive veterinary medicine.

[23]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[24]  P M Vacek,et al.  The effect of conditional dependence on the evaluation of diagnostic tests. , 1985, Biometrics.

[25]  E. Manna,et al.  Atypical endocervical glandular cells: Accuracy of cytologic diagnosis , 1995, Diagnostic cytopathology.

[26]  E. Stanbridge,et al.  Carbonic anhydrase IX and human papillomavirus as diagnostic biomarkers of cervical dysplasia/neoplasia in women with a cytologic diagnosis of atypical glandular cells: A Gynecologic Oncology Group study in United States , 2009, International journal of cancer.

[27]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[28]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[29]  M. Woodbury A missing information principle: theory and applications , 1972 .

[30]  Elizabeth R Unger,et al.  Prevalence of HPV infection among females in the United States. , 2007, JAMA.

[31]  P. Albert,et al.  A Cautionary Note on the Robustness of Latent Class Models for Estimating Diagnostic Error without a Gold Standard , 2004, Biometrics.

[32]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[33]  E. Keeler,et al.  Primer on certain elements of medical decision making. , 1975, The New England journal of medicine.

[34]  D. Wilbur Endocervical glandular atypia: A “new” problem for the cytologist , 1995, Diagnostic cytopathology.

[35]  Juni Palmgren,et al.  Maximum Likelihood Inference for Multivariate Frailty Models Using an Automated Monte Carlo EM Algorithm , 2002, Lifetime data analysis.

[36]  George B. Macready,et al.  Concomitant-Variable Latent-Class Models , 1988 .

[37]  John S. Uebersax,et al.  Probit Latent Class Analysis with Dichotomous or Ordered Category Measures: Conditional Independence/Dependence Models , 1999 .

[38]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[39]  C. McCulloch Maximum Likelihood Variance Components Estimation for Binary Data , 1994 .

[40]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[41]  C. McCulloch Maximum Likelihood Algorithms for Generalized Linear Mixed Models , 1997 .

[42]  Karen Bandeen-Roche,et al.  Building an identifiable latent class model with covariate effects on underlying and measured variables , 2004 .

[43]  Richard A. Levine,et al.  An automated (Markov chain) Monte Carlo EM algorithm , 2004 .

[44]  S. Sheps,et al.  The assessment of diagnostic tests. A survey of current medical research. , 1984, JAMA.

[45]  Nan M. Laird,et al.  Computation of variance components using the em algorithm , 1982 .

[46]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[47]  Wolfgang Jank,et al.  Quasi-Monte Carlo sampling to improve the efficiency of Monte Carlo EM , 2004, Comput. Stat. Data Anal..

[48]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[49]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[50]  Scott L. Zeger,et al.  Latent Variable Regression for Multiple Discrete Outcomes , 1997 .

[51]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[52]  Paul F. Lazarsfeld,et al.  Latent Structure Analysis. , 1969 .

[53]  S. Hui,et al.  Evaluation of diagnostic tests without gold standards , 1998, Statistical methods in medical research.