Optimal Sampling Strategies for Statistical Models with Discrete Dependent Variables

The object of this paper is to improve the cost-effectiveness of data gathering procedures for models with discrete dependent variables. It is assumed throughout the paper that the true value of the parameter vector is approximately known and that, with that information, one must select a statistically optimal number of observations from different population subgroups to refine the accuracy of the estimate. It is shown that the problem can be reduced to a small mathematical program whose objective function can be written after a few preliminary algebraic manipulations. For binary choice models, these preliminary calculations are simple enough to be implementable on 1979 state-of-the-art programmable hand calculators. It is also shown that choice-based and mixed data can be optimally selected in a similar way; in particular, binary choice-based samples drawn from one single population group are so easy to analyze that all calculations can be performed by hand. Multinomial models are less amenable to hand calculations, except, perhaps, trinomial models which require evaluation of a double integral. The technique extends naturally to limited dependent variable regression models.