Dummy Variables: Mechanics v. Interpretation

Regressions containing dummy variables are easily estimated by the familiar expedient of "dropping out" one of the categories but the result is often awkward to interpret. Since coefficients of dummy variables are determined only up to an additive constant, however, the equation can be transformed into a more easily interpretable form by adding on an appropriately chosen constant to each coefficient. For most regressions the constants should be chosen to force the mean of the transformed coefficients to equal 0. For logarithmic regressions the constants should be chosen to force the sum of the antilogs of the coefficients to equal 1. With logarithmic demand curves fitted to monthly data the resulting antilogs become monthly seasonal indexes. The technical procedure by which dummy variables are used to capture the influence of categorical variables in regression equations is generally familiar (see Goldberger (1964), Kmenta (1971), Johnston (1960), or, to go back near the beginning of things, Suits (1957)). In many cases, particularly where only two classes of observation are involved, results presented in the usual way involve no special problems of interpretation. For example, use of a dummy variable to distinguish pre-war from post-war behavior, or to measure the shift in a relationship during the period of a strike is readily understood by any reader. But where a set of several dummy variables is employed to measure the variation in behavior among a number of classes-regions, education groups, age brackets, and the like-there is often an important difference between the purely mechanical problem of fitting the regression and the quite different problem of presenting the results in the most effective fashion. The purpose of this paper is to call attention to this distinction, and to illustrate by simple examples.