Analysis of counts with two latent classes, with application to risk assessment based on physician-visit records of cancer survivors.

Motivated by a cancer survivorship program, this paper explores event counts from two categories of individuals with unobservable membership. We formulate the counts using a latent class model and consider two likelihood-based inference procedures, the maximum likelihood estimation (MLE) and a pseudo-MLE procedure. The pseudo-MLE utilizes additional information on one of the latent classes. It yields reduced computational intensity and potentially increased estimation efficiency. We establish the consistency and asymptotic normality of the proposed pseudo-MLE, and we present an extended Huber sandwich estimator as a robust variance estimator for the pseudo-MLE. The finite-sample properties of the two-parameter estimators along with their variance estimators are examined by simulation. The proposed methodology is illustrated by physician-claim data from the cancer program.

[1]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[2]  Jay Magidson,et al.  Latent class models for clustering : a comparison with K-means , 2002 .

[3]  Neil Henry Latent structure analysis , 1969 .

[4]  J. Vermunt,et al.  Latent class and finite mixture models for multilevel data sets , 2008, Statistical methods in medical research.

[5]  Margaret Sullivan Pepe,et al.  Insights into latent class analysis of diagnostic test performance. , 2007, Biostatistics.

[6]  Andy H. Lee,et al.  Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros , 2006, Statistical methods in medical research.

[7]  D. Hall,et al.  Robust Estimation for Zero‐Inflated Poisson Regression , 2009 .

[8]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[9]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[12]  S. Sheps,et al.  Childhood, adolescent, and young adult cancer survivors research program of British Columbia: Objectives, study design, and cohort characteristics , 2010, Pediatric blood & cancer.

[13]  Gail Gong,et al.  Pseudo Maximum Likelihood Estimation: Theory and Applications , 1981 .

[14]  S. Sheps,et al.  Patterns of physician follow-up among young cancer survivors: report of the Childhood, Adolescent, and Young Adult Cancer Survivors (CAYACS) research program. , 2011, Canadian family physician Medecin de famille canadien.