Hidden Markov Model-based population synthesis

Micro-simulation travel demand and land use models require a synthetic population, which consists of a set of agents characterized by demographic and socio-economic attributes. Two main families of population synthesis techniques can be distinguished: (a) fitting methods (iterative proportional fitting, updating) and (b) combinatorial optimization methods. During the last few years, a third outperforming family of population synthesis procedures has emerged, i.e., Markov process-based methods such as Monte Carlo Markov Chain (MCMC) simulations. In this paper, an extended Hidden Markov Model (HMM)-based approach is presented, which can serve as a better alternative than the existing methods. The approach is characterized by a great flexibility and efficiency in terms of data preparation and model training. The HMM is able to reproduce the structural configuration of a given population from an unlimited number of micro-samples and a marginal distribution. Only one marginal distribution of the considered population can be used as a boundary condition to “guide” the synthesis of the whole population. Model training and testing are performed using the Survey on the Workforce of 2013 and the Belgian National Household Travel Survey of 2010. Results indicate that the HMM method captures the complete heterogeneity of the micro-data contrary to standard fitting approaches. The method provides accurate results as it is able to reproduce the marginal distributions and their corresponding multivariate joint distributions with an acceptable error rate (i.e., SRSME=0.54 for 6 synthesized attributes). Furthermore, the HMM outperforms IPF for small sample sizes, even though the amount of input data is less than that for IPF. Finally, simulations show that the HMM can merge information provided by multiple data sources to allow good population estimates.

[1]  P. Waddell UrbanSim: Modeling Urban Development for Land Use, Transportation, and Environmental Planning , 2002 .

[2]  Marcel Rieser,et al.  Agent-Oriented Coupling of Activity-Based Demand Generation with Multiagent Traffic Simulation , 2007 .

[3]  M. Speekenbrink,et al.  depmixS4: An R Package for Hidden Markov Models , 2010 .

[4]  Akimichi Takemura,et al.  Iterative proportional scaling via decomposable submodels for contingency tables , 2006, Comput. Stat. Data Anal..

[5]  Guillaume Deffuant,et al.  An Iterative Approach for Generating Statistically Realistic Populations of Households , 2010, PloS one.

[6]  Johan Barthelemy,et al.  Multidimensional Iterative Proportional Fitting and AlternativeModels , 2015 .

[7]  P. Waddell,et al.  Methodology to Match Distributions of Both Household and Person Attributes in Generation of Synthetic Populations , 2009 .

[8]  M. D. McKay,et al.  Creating synthetic baseline populations , 1996 .

[9]  Kay W. Axhausen,et al.  Agent-Based Demand-Modeling Framework for Large-Scale Microsimulations , 2006 .

[10]  Eric J. Miller,et al.  Advances in population synthesis: fitting many attributes per agent and fitting to household and person margins simultaneously , 2012 .

[11]  Daniel C. Knudsen,et al.  Matrix Comparison, Goodness-of-Fit, and Spatial Interaction Modeling , 1986 .

[12]  Davy Janssens,et al.  A Data Imputation Method with Support Vector Machines for Activity-Based Transportation Models , 2011 .

[13]  Kay W. Axhausen,et al.  Population synthesis for microsimulation: State of the art , 2010 .

[14]  Michel Bierlaire,et al.  Simulation based Population Synthesis , 2013 .

[15]  P H Rees,et al.  The Estimation of Population Microdata by Using Data from Small Area Statistics and Samples of Anonymised Records , 1998, Environment & planning A.

[16]  K. Kockelman,et al.  Forecasting Greenhouse Gas Emissions from Urban Regions: Microsimulation of Land Use and Transport Patterns in Austin, Texas , 2013 .

[17]  Paul Williamson An Evaluation of Two Synthetic Small-Area Microdata Simulation Methodologies: Synthetic Reconstruction and Combinatorial Optimisation , 2012 .

[18]  Catherine Morency,et al.  Assessment of spatial transferability of an activity-based model, TASHA , 2015 .

[19]  Philippe L. Toint,et al.  A Stochastic and Flexible Activity Based Model for Large Population. Application to Belgium , 2015, J. Artif. Soc. Soc. Simul..

[20]  Mario Cools,et al.  An Integrated Framework for Forecasting Travel Behavior Using Markov Chain Monte-Carlo Simulation and Profile Hidden Markov Models , 2016 .

[21]  Zhikui Chen,et al.  A Data Imputation Method Based on Deep Belief Network , 2015, 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing.

[22]  Jerome P. Reiter,et al.  Random Forests for Generating Partially Synthetic, Categorical Data , 2010, Trans. Data Priv..

[23]  Johan Barthelemy,et al.  Synthetic Population Generation Without a Sample , 2013, Transp. Sci..

[24]  Guillaume Deffuant,et al.  Generating a Synthetic Population of Individuals in Households: Sample-Free Vs Sample-Based Methods , 2012, J. Artif. Soc. Soc. Simul..

[25]  Daniel P. Costa,et al.  Accuracy of ARGOS Locations of Pinnipeds at-Sea Estimated Using Fastloc GPS , 2010, PloS one.

[26]  Paul Williamson,et al.  An evaluation of the combinatorial optimisation approach to the creation of synthetic microdata , 2000 .

[27]  Dee Denteneer,et al.  A fast algorithm for iterative proportional fitting in log-linear models , 1985 .

[28]  Francesco M. Malvestuto,et al.  An implementation of the iterative proportional fitting procedure by propagation trees , 2001 .

[29]  Shlomo Bekhor,et al.  Integration of Activity-Based and Agent-Based Models , 2011 .

[30]  Oliver C. Ibe 14 – Hidden Markov Models , 2013 .

[31]  Michael A. P. Taylor,et al.  Forecasting greenhouse gas emissions performance of the future Australian light vehicle traffic fleet , 2017 .

[32]  Elizabeth Mziray,et al.  An Appraisal of Female Sex Work in Nigeria - Implications for Designing and Scaling Up HIV Prevention Programmes , 2014, PloS one.

[33]  Alexander Erath,et al.  A Bayesian network approach for population synthesis , 2015 .

[34]  Kerstin Hermes,et al.  A review of current methods to generate synthetic spatial microdata using reweighting and future directions , 2012, Comput. Environ. Urban Syst..

[35]  Michel Bierlaire,et al.  Associations Generation in Synthetic Population for Transportation Applications , 2014 .

[36]  Jeppe Rich,et al.  Generating synthetic baseline populations from register data , 2012 .

[37]  Kevin B. Korb,et al.  Synthetic Population Dynamics: A Model of Household Demography , 2013, J. Artif. Soc. Soc. Simul..

[38]  Vladimir Livshits,et al.  New Features of Population Synthesis , 2015 .

[39]  Pascal Perez,et al.  Generating a Dynamic Synthetic Population – Using an Age-Structured Two-Sex Model for Household Dynamics , 2014, PloS one.

[40]  R. Jirousek,et al.  On the effective implementation of the iterative proportional fitting procedure , 1995 .