The work reported here offers for the first time a thorough comparison of two established methodologies for the creation of small area synthetic microdata, synthetic reconstruction and combinatorial optimisation. Two computer models, Pop91SR and Pop91CO, have been developed for the reconstruction of ED level populations drawing upon 1991 Census data. The adequacy of their outputs has been assessed at cellular, tabular and overall levels. Consideration has also been given to the impact on outputs of aggregating ED estimates into wards. Compared with previous synthetic reconstruction models, Pop91SR employs the following new techniques: (a) use of the SAR to examine relationships between variables and determine the ordering of conditional probabilities; (b) a three-level modelling approach to create the conditional distributions, combining data from the SAS, LBS and SAR; and (c) adoption of a modified Monte Carlo sampling procedure. These techniques maximise the use of information and greatly reduce the sampling error, thereby increasing estimation accuracy. The major improvements in Pop91CO are: (a) using a new criterion (RSSZm) for the selection of household combinations; (b) selection of households from the relevant SAR region, where possible; and (c) a revised set of stopping rules to control the number of iterations and improve the consistency of outputs. Using RSSZm as the selection criterion yields significant improvements in the quality of the synthetic data generated. An assessment of outputs from the two rival approaches, produced using the same smallarea constraints, suggests that both can produce synthetic microdata that fit constraining tables extremely well. But further examination reveals that the variability of datasets generated by combinatorial optimisation is considerably less than that for datasets created by synthetic reconstruction, at both ED and ward levels, making combinatorial optimisation the approach of choice for the creation of a single set of synthetic microdata. Acknowledgements The work reported in this paper was undertaken as part of an ESRC-funded project on ‘The creation of a national set of validated small-area microdata’, award no. R000237744.
[1]
P H Rees,et al.
The Estimation of Population Microdata by Using Data from Small Area Statistics and Samples of Anonymised Records
,
1998,
Environment & planning A.
[2]
P Williamson,et al.
Indexing the Census: A by-Product of the Simulation of Whole Populations by Means of SAS and SAR Data
,
1995,
Environment & planning A.
[3]
A Dale,et al.
The Value of the SARS in Spatial and Area-Level Research
,
1998,
Environment & planning A.
[4]
W. Loh,et al.
SPLIT SELECTION METHODS FOR CLASSIFICATION TREES
,
1997
.
[5]
David W. S. Wong,et al.
The Reliability of Using the Iterative Proportional Fitting Procedure
,
1992
.
[6]
Paul Williamson,et al.
An evaluation of the combinatorial optimisation approach to the creation of synthetic microdata
,
2000
.
[7]
David Voas,et al.
The diversity of diversity: a critique of geodemographic classification
,
2001
.
[8]
Martin Clarke,et al.
Synthesis—A Synthetic Spatial Information System for Urban and Regional Analysis: Methods and Examples
,
1988
.
[9]
Daniel C. Knudsen,et al.
Matrix Comparison, Goodness-of-Fit, and Spatial Interaction Modeling
,
1986
.
[10]
David Voas,et al.
The Scale of Dissimilarity: Concepts, Measurement and an Application to Socio‐Economic Variation Across England and Wales
,
2000
.
[11]
S. Fienberg.
An Iterative Procedure for Estimation in Contingency Tables
,
1970
.
[12]
David Voas,et al.
Evaluating Goodness-of-Fit Measures for Synthetic Microdata
,
2001
.
[13]
Stephen E. Fienberg,et al.
Discrete Multivariate Analysis: Theory and Practice
,
1976
.