'Truncate, replicate, sample': A method for creating integer weights for spatial microsimulation

Abstract Iterative proportional fitting (IPF) is a widely used method for spatial microsimulation. The technique results in non-integer weights for individual rows of data. This is problematic for certain applications and has led many researchers to favour combinatorial optimisation approaches such as simulated annealing. An alternative to this is ‘integerisation’ of IPF weights: the translation of the continuous weight variable into a discrete number of unique or ‘cloned’ individuals. We describe four existing methods of integerisation and present a new one. Our method – ‘truncate, replicate, sample’ (TRS) – recognises that IPF weights consist of both ‘replication weights’ and ‘conventional weights’, the effects of which need to be separated. The procedure consists of three steps: (1) separate replication and conventional weights by truncation; (2) replication of individuals with positive integer weights; and (3) probabilistic sampling. The results, which are reproducible using supplementary code and data published alongside this paper, show that TRS is fast, and more accurate than alternative approaches to integerisation.

[1]  Mark Birkin,et al.  A Dynamic MSM With Agent Elements for Spatial Demographic Forecasting , 2011 .

[2]  Martin Clarke,et al.  Synthesis—A Synthetic Spatial Information System for Urban and Regional Analysis: Methods and Examples , 1988 .

[3]  Robert Gentleman,et al.  Statistical Analyses and Reproducible Research , 2007 .

[4]  Robert Tanton,et al.  Poverty at the Local Level: National and Small Area Poverty Estimates by Family Type for Australia in 2006 , 2011 .

[5]  P. Kanaroglou,et al.  Population Synthesis: Comparing the Major Techniques Using a Small, Complete Population of Firms , 2009 .

[6]  Kimberley L Edwards,et al.  The design and validation of a spatial microsimulation model of obesogenic environments for children in Leeds, UK: SimObesity. , 2009, Social science & medicine.

[7]  Azizur Rahman,et al.  Methodological Issues in Spatial Microsimulation Modelling for Small Area Estimation , 2009 .

[8]  S Saito A Multistep Iterative Proportional Fitting Procedure to Estimate Cohortwise Interregional Migration Tables Where Only Inconsistent Marginals are Known , 1992, Environment & planning A.

[9]  Paul Williamson,et al.  An evaluation of the combinatorial optimisation approach to the creation of synthetic microdata , 2000 .

[10]  R. Jirousek,et al.  On the effective implementation of the iterative proportional fitting procedure , 1995 .

[11]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[12]  Philip Rees,et al.  Census data resources in the United Kingdom , 2002 .

[13]  W. Deming,et al.  On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known , 1940 .

[14]  Robert Tanton,et al.  PROJECTING SMALL AREA STATISTICS WITH AUSTRALIAN SPATIAL MICROSIMULATION MODEL (SPATIALMSM) , 2010 .

[15]  Mark Tranmer,et al.  Combining Sample and Census Data in Small Area Estimates: Iterative Proportional Fitting with Standard Software , 2005 .

[16]  Graham Clarke,et al.  Building a Dynamic Spatial Microsimulation Model for Ireland , 2005 .

[17]  G. Nigel Gilbert,et al.  Simulation for the social scientist , 1999 .

[18]  David Voas,et al.  Evaluating Goodness-of-Fit Measures for Synthetic Microdata , 2001 .

[19]  Paul Norman,et al.  Putting Iterative Proportional Fitting on the researcher’s desk , 1999 .

[20]  David W. S. Wong The Modifiable Areal Unit Problem (MAUP) , 2004 .

[21]  Alison J. Heppenstall,et al.  Creating Realistic Synthetic Populations at Varying Spatial Scales: A Comparative Critique of Population Synthesis Techniques , 2012, J. Artif. Soc. Soc. Simul..

[22]  Graham Clarke,et al.  The geography of smoking in Leeds: estimating individual smoking rates and the implications for the location of stop smoking services , 2008 .

[23]  Ben Anderson,et al.  Creating Small Area Income Estimates for England: spatial microsimulation modelling , 2007 .

[24]  G. Nigel Gilbert,et al.  Agent-Based Models , 2007 .

[25]  Yaojun Li,et al.  Samples of Anonymized Records (SARs) from the UK Censuses , 2004 .

[26]  P H Rees,et al.  The Estimation of Population Microdata by Using Data from Small Area Statistics and Samples of Anonymised Records , 1998, Environment & planning A.

[27]  Mark H. Birkin,et al.  A spatial microsimulation model with student agents , 2008, Comput. Environ. Urban Syst..

[28]  Kay W. Axhausen,et al.  Population synthesis for microsimulation: State of the art , 2010 .

[29]  Graham Clarke,et al.  SimBritain: a spatial microsimulation approach to population dynamics , 2005 .

[30]  Kerstin Hermes,et al.  A review of current methods to generate synthetic spatial microdata using reweighting and future directions , 2012, Comput. Environ. Urban Syst..

[31]  Graham Clarke,et al.  A Review of Microsimulation for Policy Analysis , 2013 .

[32]  P. McLoone,et al.  Inequalities in life and death: what if Britain were more equal? , 2001 .

[33]  Jean-Paul Chilès,et al.  Wiley Series in Probability and Statistics , 2012 .

[34]  Urban Lindgren,et al.  Simulating an entire nation , 1996 .

[35]  D. Dorling,et al.  Geography Matters: Simulating the Local Impacts of National Social Policies , 2004 .

[36]  Robert Tanton,et al.  Spatial microsimulation : a reference guide for users , 2013 .

[37]  Graham Clarke,et al.  The enhancement of spatial microsimulation models using geodemographics , 2012 .

[38]  S. Fienberg An Iterative Procedure for Estimation in Contingency Tables , 1970 .

[39]  Graham Clarke,et al.  Improving the Synthetic Data Generation Process in Spatial Microsimulation Models , 2009 .

[40]  Alan J. Lee,et al.  Generating Synthetic microdata from published marginal tables and confidentialised files , 2009 .

[41]  William L. Goffe,et al.  SIMANN: FORTRAN module to perform Global Optimization of Statistical Functions with Simulated Annealing , 1992 .

[42]  Bahman Kalantari,et al.  On the complexity of general matrix scaling and entropy minimization via the RAS algorithm , 2007, Math. Program..

[43]  Adrian McDonald,et al.  Domestic Water Demand Forecasting: A Static Microsimulation Approach , 2002 .

[44]  Robert Tanton,et al.  Projecting Small Area Statistics with Australian Microsimulation Model (SPATIALMSM) , 2010 .

[45]  Martin Clarke,et al.  Spatial Microsimulation Models: A Review and a Glimpse into the Future , 2011 .

[46]  Ben Anderson,et al.  Estimating Small-Area Income Deprivation: An Iterative Proportional Fitting Approach , 2012 .

[47]  Mark Birkin,et al.  Using Spatial Microsimulation to Model Social and Spatial Inequalities in Educational Attainment , 2013 .

[48]  Martin Clarke,et al.  The Generation of Individual and Household Incomes at the Small Area Level using Synthesis , 1989 .

[49]  M. D. McKay,et al.  Creating synthetic baseline populations , 1996 .

[50]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[51]  Eric J. Miller,et al.  Advances in population synthesis: fitting many attributes per agent and fitting to household and person margins simultaneously , 2012 .

[52]  Frederick Mosteller,et al.  Association and Estimation in Contingency Tables , 1968 .

[53]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[54]  Darrel C. Ince,et al.  The case for open computer programs , 2012, Nature.

[55]  Graham Clarke,et al.  A spatial micro-simulation analysis of methane emissions from Irish agriculture , 2009 .

[56]  Pieter Hooimeijer A life course approach to urban dynamics: state of the art in and research , 1996 .

[57]  Graham Clarke,et al.  Modelling the Socio-economic Impacts of Major Job Loss or Gain at the Local Level: a Spatial Microsimulation Framework , 2006 .

[58]  F. Dominici,et al.  Reproducible epidemiologic research. , 2006, American journal of epidemiology.

[59]  David W. S. Wong,et al.  The Reliability of Using the Iterative Proportional Fitting Procedure , 1992 .

[60]  Ron Johnston,et al.  Entropy-Maximizing and the Iterative Proportional Fitting Procedure , 1993 .