“Ecological” Inference: The Use of Aggregate Data to Study Individuals

Because they are inexpensive and easy to obtain, because they may be available under circumstances in which survey data are unavailable, and because they eliminate many of the measurement problems of survey research, data on geographic units such as counties or census tracts are often used by political scientists to measure individual behavior. This has involved us in the long-standing problem of inferring individual-level relationships from aggregate data, which was first raised by W. S. Robinson in the early nineteen fifties. In this paper, I shall first discuss the problem raised by Robinson. I shall then review three partial solutions to the problem—the Duncan-Davis method of setting limits, Blalock's version of ecological regression, and Goodman's version of ecological regression. Finally, I shall propose some ways in which Goodman's method may be used so as to reduce the problem of bias in its estimates, and make it a more reasonable tool for reserch. Our difficulty, as Robinson showed, is that we cannot necessarily infer the correlation between variables, taking people as the unit of analysis, on the basis of correlations between the same variables based on groups of people as units. For example, the “ecological” correlation between per cent black and per cent illiterate is +0.946, whereas the correlation between color and illiteracy among individuals is only+0.203.

[1]  K. O'Lessker Who Voted for Hitler? A New Look at the Class Basis of Naziism , 1968, American Journal of Sociology.

[2]  R. Crain,et al.  Community status as a dimension of local decision-making. , 1967, American sociological review.

[3]  G. Pomper Classification of Presidential Elections , 1967, The Journal of Politics.

[4]  Leo A. Goodman,et al.  Some Alternatives to Ecological Correlation , 1959, American Journal of Sociology.

[5]  Otis Dudley Duncan,et al.  An Alternative to Ecological Correlation , 1953 .