Maximum Likelihood Estimations and EM Algorithms With Length-Biased Data

Length-biased sampling has been well recognized in economics, industrial reliability, etiology applications, and epidemiological, genetic, and cancer screening studies. Length-biased right-censored data have a unique data structure different from traditional survival data. The nonparametric and semiparametric estimation and inference methods for traditional survival data are not directly applicable for length-biased right-censored data. We propose new expectation-maximization algorithms for estimations based on full likelihoods involving infinite-dimensional parameters under three settings for length-biased data: estimating nonparametric distribution function, estimating nonparametric hazard function under an increasing failure rate constraint, and jointly estimating baseline hazards function and the covariate coefficients under the Cox proportional hazards model. Extensive empirical simulation studies show that the maximum likelihood estimators perform well with moderate sample sizes and lead to more efficient estimators compared to the estimating equation approaches. The proposed estimates are also more robust to various right-censoring mechanisms. We prove the strong consistency properties of the estimators, and establish the asymptotic normality of the semiparametric maximum likelihood estimators under the Cox model using modern empirical processes theory. We apply the proposed methods to a prevalent cohort medical study. Supplemental materials are available online.

[1]  Niels Keiding,et al.  Design and analysis of time-to-pregnancy , 2006, Statistical methods in medical research.

[2]  John D. Kalbfleisch,et al.  Estimation of sojourn time distributions for cyclic semi-Markov processes in equilibrium , 1987 .

[3]  Wei-Yann Tsai,et al.  Estimation of the survival function with increasing failure rate based on left truncated and right censored data , 1988 .

[4]  W. J. Padgett,et al.  Maximum likelihood estimation of a distribution function with increasing failure rate based on censored observations , 1980 .

[5]  S. McClean,et al.  A nonparametrie maximum likelihood estimator for incomplete renewal data , 1995 .

[6]  Eberhard Zeidler,et al.  Applied Functional Analysis: Main Principles and Their Applications , 1995 .

[7]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[8]  D E Weeks,et al.  True and false positive peaks in genomewide scans: applications of length-biased sampling to linkage mapping. , 1997, American journal of human genetics.

[9]  P. Visscher,et al.  True and false positive peaks in genomewide scans: The long and the short of it , 2001, Genetic epidemiology.

[10]  Niels Keiding,et al.  Statistical Models Based on Counting Processes , 1993 .

[11]  R. Little,et al.  Proportional hazards regression with missing covariates , 1999 .

[12]  J P Klein,et al.  Semiparametric estimation of random effects using the Cox model based on the EM algorithm. , 1992, Biometrics.

[13]  N P Jewell,et al.  Statistical models for prevalent cohort data. , 1993, Biometrics.

[14]  T. Rothenberg Identification in Parametric Models , 1971 .

[15]  Donglin Zeng,et al.  Maximum likelihood estimation in semiparametric regression models with censored data , 2007, Statistica Sinica.

[16]  Marvin Zelen,et al.  On the theory of screening for chronic diseases , 1969 .

[17]  Yuzuru Kakuda Existence and Consistence , 1997 .

[18]  Yu Shen,et al.  Statistical Methods for Analyzing Right‐Censored Length‐Biased Data under Cox Model , 2010, Biometrics.

[19]  Tony Lancaster,et al.  The Econometric Analysis of Transition Data. , 1992 .

[20]  T. Ferguson A Course in Large Sample Theory , 1996 .

[21]  G. ÁLvarez-Llorente,et al.  Estimation under length-bias and right-censoring: An application to unemployment duration analysis for married women , 2003 .

[22]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[23]  Susan A. Murphy,et al.  Asymptotic Theory for the Frailty Model , 1995 .

[24]  David B Wolfson,et al.  Checking stationarity of the incidence rate using prevalent cohort survival data , 2006, Statistics in medicine.

[25]  Y. Vardi,et al.  Nonparametric Estimation in the Presence of Length Bias , 1982 .

[26]  J. Kalbfleisch,et al.  Marginal likelihoods based on Cox's regression and life model , 1973 .

[27]  Mitchell H. Gail,et al.  Encyclopedia of Epidemiologic Methods , 2002 .

[28]  J. Wellner,et al.  Existence and consistency of maximum likelihood in upgraded mixture models , 1992 .

[29]  Paul H. Kvam,et al.  Length Bias in the Measurements of Carbon Nanotubes , 2008, Technometrics.

[30]  Yehuda Vardi,et al.  Multiplicative censoring, renewal processes, deconvolution and decreasing density: Nonparametric estimation , 1989 .

[31]  John O'Quigley,et al.  Proportional Hazards Regression , 2008 .

[32]  Masoud Asgharian,et al.  Covariate Bias Induced by Length-Biased Sampling of Failure Times , 2008 .

[33]  David B Wolfson,et al.  Length-Biased Sampling With Right Censoring , 2002 .

[34]  Yehuda Vardi,et al.  Large Sample Study of Empirical Distributions in a Random-Multiplicative Censoring Model , 1992 .

[35]  Frank Proschan,et al.  Maximum Likelihood Estimation for Distributions with Monotone Failure Rate , 1965 .

[36]  Erik T. Parner,et al.  Asymptotic theory for the correlated gamma-frailty model , 1998 .

[37]  Masoud Asgharian,et al.  Asymptotic behavior of the unconditional NPMLE of the length-biased survivor function from right censored prevalent cohort data , 2005, math/0602239.

[38]  Mei-Cheng Wang,et al.  Hazards regression analysis for length-biased data , 1996 .

[39]  Brian Peacock,et al.  Empirical Distribution Function , 2010 .

[40]  Mei-Cheng Wang,et al.  Nonparametric Estimation from Cross-Sectional Survival Data , 1991 .

[41]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[42]  R Simon,et al.  Length biased sampling in etiologic studies. , 1980, American journal of epidemiology.

[43]  Peter J. Bickel,et al.  Efficient Estimation Using Both Direct and Indirect Observations , 1994 .

[44]  Richard D. Gill,et al.  A counting process approach to maximum likelihood estimation in frailty models , 1992 .

[45]  Marvin Zelen,et al.  Forward and Backward Recurrence Times and Length Biased Sampling: Age Specific Models , 2004, Lifetime data analysis.

[46]  Olcay Akman,et al.  Transformations of the Lognormal Distribution as a Selection Model , 2000 .

[47]  W. Tsai,et al.  Pseudo-partial likelihood for proportional hazards models with biased-sampling data. , 2009, Biometrika.

[48]  B. Turnbull The Empirical Distribution Function with Arbitrarily Grouped, Censored, and Truncated Data , 1976 .

[49]  Chris Elbers,et al.  True and Spurious Duration Dependence: The Identifiability of the Proportional Hazard Model , 1982 .

[50]  H. D. Miller,et al.  The Theory Of Stochastic Processes , 1977, The Mathematical Gazette.

[51]  T Ostbye,et al.  A reevaluation of the duration of survival after the onset of dementia. , 2001, The New England journal of medicine.

[52]  Susan A. Murphy,et al.  Consistency in a Proportional Hazards Model Incorporating a Random Effect , 1994 .

[53]  Jian Huang,et al.  Estimation of a Monotone Density or Monotone Hazard Under Random Censoring , 1995 .

[54]  Susan A. Murphy,et al.  Observed information in semi-parametric models , 1999 .

[55]  Niels Keiding,et al.  Age‐Specific Incidence and Prevalence: A Statistical Perspective , 1991 .

[56]  Donglin Zeng,et al.  Maximum Likelihood Estimation for the Proportional Odds Model With Random Effects , 2005 .

[57]  D.,et al.  Regression Models and Life-Tables , 2022 .