Correcting for unobserved heterogeneity in hazard models: an application of the Heckman-Singer procedure to demographic data.

Most hazard models that demographers use have assumed that all heterogeneity is captured by measured covariates. Although some research has allowed for unmeasured heterogeneity it has assumed a particular parametric distribution. Inappropriate choice of functional form may lead to biased results. This discussion compares parametric and nonparametric estimates of the hazard in the absence of heterogeneity and for parametric forms of the hazard compares estimates that correct for heterogeneity without specifying a parametric distribution for it with estimates that do not correct for heterogeneity. Data on birth intervals and child mortality from the Korean National Fertility Survey were used. This exploratory analysis suggested the following conclusions. Hazard models that correct for unmeasured heterogeneity with the Heckman-Singer nonparametric techniques are costly to estimate. In part the expense is because of the need of starting the optimization at many points to try to ensure that one finds the global and not a local maximum. With a given set of starting values convergence was achieved more rapidly when gradient methods rather than the EM algorithm suggested by Heckman-Singer were used. Results and the inferences one would draw are sensitive of the parametric form assumed for the hazard. In this analysis of the covariates of mortality a Gompertz and a Weibull distribution was used. Both have hazards that appear to be reasonable candidates for the underlying pattern of mortality rates by age although in previous research the Weibull has generally been preferred for the analysis of child mortality. When those models were estimated with no correction for heterogeneity results were nearly identical both to each other and to those estimated from a model with a nonparametric hazard. When heterogeneity was allowed the 2 models yielded different final results. Also results can be very sensitive to model specification. The mortality analyses yielded different results depending on how breastfeeding was handled although the results for fertility were more stable. When heterogeneity was ignored sensitivity to model specification was reduced considerably. It is suspected that stable parameter estimates are much harder to obtain when a rare event such as child mortality is studied. These tentative methodological findings lead to the conclusion that the investigator who wants to avoid model misspecification by correcting for unobserved heterogeneity is treading on dangerous ground. Even with a nonparametric representation of heterogeneity results were sensitive to choice of hazard.