Logistic regression with incompletely observed categorical covariates--investigating the sensitivity against violation of the missing at random assumption.

Missing values in the covariates are a widespread complication in the statistical inference of regression models. The maximum likelihood principle requires specification of the distribution of the covariates, at least in part. For categorical covariates, log-linear models can be used. Additionally, the missing at random assumption is necessary, which excludes a dependence of the occurrence of missing values on the unobserved covariate values. This assumption is often highly questionable. We present a framework to specify alternative missing value mechanisms such that maximum likelihood estimation of the regression parameters under a specified alternative is possible. This allows investigation of the sensitivity of a single estimate against violations of the missing at random assumption. The possible results of a sensitivity analysis are illustrated by artificial examples. The practical application is demonstrated by the analysis of two case-control studies.

[1]  Michael J. Phillips Contingency tables with missing data , 1993 .

[2]  Margaret S. Pepe,et al.  A mean score method for missing and auxiliary covariate data in regression models , 1995 .

[3]  A. Mourant,et al.  The distribution of the human blood groups, and other polymorphisms , 1976 .

[4]  J. Ibrahim Incomplete Data in Generalized Linear Models , 1990 .

[5]  R. Little Models for Nonresponse in Sample Surveys , 1982 .

[6]  W Vach,et al.  Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. , 1991, American journal of epidemiology.

[7]  Raymond J. Carroll,et al.  Semiparametric Estimation in Logistic Measurement Error Models , 1989 .

[8]  Thomas R. Fleming,et al.  A Nonparametric Method for Dealing with Mismeasured Covariate Data , 1991 .

[9]  N. Breslow,et al.  Statistical methods in cancer research: volume 1- The analysis of case-control studies , 1980 .

[10]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[11]  Erik V. Nordheim,et al.  Inference from Nonrandomly Missing Categorical Data: An Example from a Genetic Study on Turner's Syndrome , 1984 .

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Andrew L. Rukhin,et al.  Tools for statistical inference , 1991 .

[14]  Mitchell H. Gail,et al.  A Delta Method for Implicitly Defined Random Variables , 1989 .

[15]  K Y Liang,et al.  An overview of methods for the analysis of longitudinal data. , 1992, Statistics in medicine.

[16]  M. Blettner,et al.  Medical risk factors and the development of brain tumors , 1992, Cancer.

[17]  A. Jacobsen,et al.  Adjuvant radiotherapy and risk of contralateral breast cancer. , 1992, Journal of the National Cancer Institute.