Synthetic Population Generation Without a Sample

The advent of microsimulation in the transportation sector has created the need for extensive disaggregated data concerning the population whose behavior is modeled. Because of the cost of collecting this data and the existing privacy regulations, this need is often met by the creation of a synthetic population on the basis of aggregate data. Although several techniques for generating such a population are known, they suffer from a number of limitations. The first is the need for a sample of the population for which fully disaggregated data must be collected, although such samples may not exist or may not be financially feasible. The second limiting assumption is that the aggregate data used must be consistent, a situation that is most unusual because these data often come from different sources and are collected, possibly at different moments, using different protocols. The paper presents a new synthetic population generator in the class of the Synthetic Reconstruction methods, whose objective is to obviate these limitations. It proceeds in three main successive steps: generation of individuals, generation of household type's joint distributions, and generation of households by gathering individuals. The main idea in these generation steps is to use data at the most disaggregated level possible to define joint distributions, from which individuals and households are randomly drawn. The method also makes explicit use of both continuous and discrete optimization and uses the χ2 metric to estimate distances between estimated and generated distributions. The new generator is applied for constructing a synthetic population of approximately 10,000,000 individuals and 4,350,000 households localized in the 589 municipalities of Belgium. The statistical quality of the generated population is discussed using criteria extracted from the literature, and it is shown that the new population generator produces excellent results.

[1]  Stanley Lemeshow,et al.  Sampling of Populations: Methods and Applications , 1991 .

[2]  S. Kullback,et al.  Contingency tables with given marginals. , 1968, Biometrika.

[3]  S. Srinivasan,et al.  Procedure for Forecasting Household Characteristics for Input to Travel-demand Models , 2008 .

[4]  M. D. McKay,et al.  Creating synthetic baseline populations , 1996 .

[5]  H. B. Dwight,et al.  Tables of Integrals and Other Mathematical Data , 1934 .

[6]  Nicholas I. M. Gould,et al.  GALAHAD, a library of thread-safe Fortran 90 packages for large-scale nonlinear optimization , 2003, TOMS.

[7]  Alan Wilson Urban and regional models in geography and planning , 1974 .

[8]  R. Little,et al.  Models for Contingency Tables with Known Margins when Target and Sampled Populations Differ , 1991 .

[9]  Philippe L. Toint,et al.  Synthetic populations : a tool for estimatig travel demand. , 2005 .

[10]  Ta Theo Arentze,et al.  Creating Synthetic Household Populations , 2007 .

[11]  Joshua Auld,et al.  Efficient Methodology for Generating Synthetic Populations with Multiple Control Levels , 2010 .

[12]  Zengyi Huang A COMPARISON OF SYNTHETIC RECONSTRUCTION AND COMBINATORIAL OPTIMISATION APPROACHES TO THE CREATION OF SMALL-AREA MICRODATA , 2002 .

[13]  Juan de Dios Ortúzar,et al.  Modelling Transport, 2nd Edition , 1990 .

[14]  Frick Generating synthetic populations using IPF and Monte Carlo techniques , 2004 .

[15]  Chandra R. Bhat,et al.  Population Synthesis for Microsimulating Travel Behavior , 2007 .

[16]  Fred W. Glover,et al.  Future paths for integer programming and links to artificial intelligence , 1986, Comput. Oper. Res..

[17]  Nicholas I. M. Gould,et al.  Lancelot: A FORTRAN Package for Large-Scale Nonlinear Optimization (Release A) , 1992 .

[18]  W. Deming,et al.  On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known , 1940 .

[19]  L TointPhilippe,et al.  Synthetic Population Generation Without a Sample , 2013 .

[20]  E. Kreyszig,et al.  Advanced Engineering Mathematics. , 1974 .

[21]  Alan Wilson,et al.  Entropy in urban and regional modelling , 1972, Handbook on Entropy, Complexity and Spatial Dynamics.

[22]  Michel Bierlaire,et al.  Evaluation de la demande en trafic : quelques méthodes de distribution , 1991 .

[23]  Kay W. Axhausen,et al.  Population synthesis for microsimulation: State of the art , 2010 .

[24]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[25]  P. Waddell,et al.  Methodology to Match Distributions of Both Household and Person Attributes in Generation of Synthetic Populations , 2009 .

[26]  David R. Pritchard,et al.  Advances in Agent Population Synthesis and Application in an Integrated Land Use and Transportation Model , 2009 .

[27]  David Voas,et al.  Evaluating Goodness-of-Fit Measures for Synthetic Microdata , 2001 .

[28]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[29]  A. G. Wilson,et al.  A new representation of the urban system for modelling and for the study of micro-level interdependence , 1976 .

[30]  Jean-Paul Hubert,et al.  La mobilite quotidienne des belges. , 2002 .

[31]  Frederick Mosteller,et al.  Association and Estimation in Contingency Tables , 1968 .