Estimating linear regression models in the presence of a censored independent variable

The current study examined the impact of a censored independent variable, after adjusting for a second independent variable, when estimating regression coefficients using ‘naïve’ ordinary least squares (OLS), ‘partial’ OLS and full‐likelihood models. We used Monte Carlo simulations to determine the bias associated with all three regression methods. We demonstrated that substantial bias was introduced in the estimation of the regression coefficient associated with the variable subject to a ceiling effect when naïve OLS regression was used. Furthermore, minor bias was transmitted to the estimation of the regression coefficient associated with the second independent variable. High correlation between the two independent variables improved estimation of the censored variable's coefficient at the expense of estimation of the other coefficient. The use of ‘partial’ OLS and maximum‐likelihood estimation were shown to result in, at most, negligible bias in estimation. Furthermore, we demonstrated that the full‐likelihood method was robust under misspecification of the joint distribution of the independent random variables. Lastly, we provided an empirical example using National Population Health Survey (NPHS) data to demonstrate the practical implications of our main findings and the simple methods available to circumvent the bias identified in the Monte Carlo simulations. Our results suggest that researchers need to be aware of the bias associated with the use of naïve ordinary least‐squares estimation when estimating regression models in which at least one independent variable is subject to a ceiling effect. Copyright © 2004 John Wiley & Sons, Ltd.

[1]  Peter C. Austin,et al.  Type I Error Inflation in the Presence of a Ceiling Effect , 2003 .

[2]  Kristopher J Preacher,et al.  On the practice of dichotomization of quantitative variables. , 2002, Psychological methods.

[3]  P. Sen,et al.  Effect of dichotomizing a continuous variable on the model structure in multiple linear regression models , 2000 .

[4]  P. Sen,et al.  Effect of dichotomizinlg a continuous variable on the model structure in multiple linear regression models , 2000 .

[5]  J. Rehm,et al.  Measuring quantity, frequency, and volume of drinking. , 1998, Alcoholism, clinical and experimental research.

[6]  John DiNardo,et al.  Econometric methods. 4th ed. , 1997 .

[7]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[8]  Bertrand Melenberg,et al.  Parametric and semi-parametric modelling of vacation expenditures , 1996 .

[9]  P. Lachenbruch,et al.  On encoding values for data recorded as X > C , 1995 .

[10]  D. Collett Modelling Survival Data in Medical Research , 1994 .

[11]  S. Maxwell,et al.  Bivariate median splits and spurious statistical significance. , 1993 .

[12]  M H Gail,et al.  A bibliography and comments on the use of statistical models in epidemiology in the 1980s. , 1991, Statistics in medicine.

[13]  B G Armstrong,et al.  The effects of measurement errors on relative risk regressions. , 1990, American journal of epidemiology.

[14]  C. Jarque An application of LDV models to household expenditure analysis in Mexico , 1987 .

[15]  Zvi Griliches,et al.  ECONOMIC DATA ISSUES , 1986 .

[16]  P. Schmidt,et al.  Limited-Dependent and Qualitative Variables in Econometrics. , 1984 .

[17]  J. Lawless Statistical Models and Methods for Lifetime Data , 1983 .

[18]  G. Maddala Limited-dependent and qualitative variables in econometrics: Introduction , 1983 .

[19]  Greene Wh,et al.  Divorce risk and wives labor supply behavior , 1982 .

[20]  W. Greene ON THE ASYMPTOTIC BIAS OF THE ORDINARY LEAST SQUARES ESTIMATOR OF THE TOBIT MODEL , 1981 .

[21]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[22]  F. David,et al.  Statistical Estimates and Transformed Beta-Variables. , 1960 .

[23]  J. Tobin Estimation of Relationships for Limited Dependent Variables , 1958 .

[24]  D.Sc. Joseph Berkson Are there Two Regressions , 1950 .

[25]  F. Mosteller On Some Useful "Inefficient" Statistics , 1946 .