Fitting regression models with response‐biased samples

This paper extends the work in Lawless, Kalbfleisch, & Wild (1999) on fitting regression models with response-biased samples, that is, samples where some or all the covariates are missing for some units and the probability that this happens depends in part on the value of the reponse of that unit. In general, the resulting likelihood depends on the distribution of the covariates but we are only interested in methods that do not involve modelling this distribution. We look at a variety of methods based on estimating equations, at the relationship of these methods to semi-parametric efficient methods in cases where such methods exist, and show ways of obtaining efficiency gains that can sometimes be dramatic. The Canadian Journal of Statistics 39: 519–536; 2011 © 2011 Statistical Society of Canada Cet article generalise les travaux de Lawless, Kalbfleisch et Wild (1999) sur l'ajustement de modeles de regression pour des echantillons avec biais du a la reponse, c'est-a-dire des echantillons pour lesquels quelques ou toutes les covariables sont manquantes pour quelques unites et la probabilite que cela se produise depend de la valeur de la variable reponse de ces unites. En general, la vraisemblance resultante depend de la distribution des covariables, mais nous sommes uniquement interesses aux methodes qui n'impliquent pas la modelisation de cette distribution. Nous considerons une variete de methodes basees sur les equations d'estimation et a la relation entre ces methodes et les methodes semi-parametriques efficaces lorsque celles-ci existent. Nous montrons des facons d'obtenir des gains d'efficacite qui peuvent parfois etre tres importants. La revue canadienne de statistique 39:519–536;2011 © 2011 Societe statistique du Canada

[1]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[2]  M. Pepe,et al.  A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data , 1994 .

[3]  D. Pfeffermann The Role of Sampling Weights when Modeling Survey Data , 1993 .

[4]  T. Lumley Robustness of Semiparametric Efficiency in Nearly-Correct Models for Two-Phase Samples , 2017, 1707.05924.

[5]  Margaret S. Pepe,et al.  A mean score method for missing and auxiliary covariate data in regression models , 1995 .

[6]  G. Kalton,et al.  Handling missing data in survey research , 1996, Statistical methods in medical research.

[7]  Norman E. Breslow,et al.  Logistic regression for two-stage case-control data , 1988 .

[8]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[9]  T. Raghunathan,et al.  Use of Low-Dose Oral Contraceptives and Stroke in Young Women , 1997, Annals of Internal Medicine.

[10]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[11]  J. Rao,et al.  Variance estimation under two-phase sampling with application to imputation for missing data , 1995 .

[12]  Alastair Scott,et al.  Maximum likelihood for generalised case-control studies , 2001 .

[13]  Yannan Jiang,et al.  Secondary analysis of case‐control data , 2006, Statistics in medicine.

[14]  C. Särndal,et al.  Calibration Estimators in Survey Sampling , 1992 .

[15]  Chris J. Skinner,et al.  QUASI-SCORE TESTS WITH SURVEY DATA , 1998 .

[16]  A. Scott,et al.  Population-based case-control studies , 2009 .

[17]  Yuichi Hirose,et al.  Semi-parametric efficiency bounds for regression models under response-selective sampling: the profile likelihood approach , 2010 .

[18]  Jerald F. Lawless,et al.  Semiparametric methods for response‐selective and missing data problems in regression , 1999 .

[19]  T. Louis,et al.  A Note on Marginal Linear Regression with Correlated Response Data , 2000 .

[20]  Alastair Scott,et al.  Efficient estimation in multi-phase case-control studies , 2010 .

[21]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[22]  L. Kish,et al.  Inference from Complex Samples , 1974 .

[23]  Norman E. Breslow,et al.  Large Sample Theory for Semiparametric Regression Models with Two-Phase, Outcome Dependent Sampling , 2003 .

[24]  Thomas Lumley,et al.  Improved Horvitz–Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology , 2009, Statistics in biosciences.

[25]  Estimating incidence rates from population-based case-control studies in the presence of nonrespondents , 2002 .

[26]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[27]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[28]  A. Scott,et al.  Re-using data from case-control studies. , 1997, Statistics in medicine.

[29]  D. Binder On the variances of asymptotically normal estimators from complex surveys , 1983 .

[30]  S. Eguchi,et al.  A paradox concerning nuisance parameters and projected estimating functions , 2004 .