An efficient hierarchical model for multi-source information fusion

Abstract In urban and transportation research, important information is often scattered over a wide variety of independent datasets which vary in terms of described variables and sampling rates. As activity-travel behavior of people depends particularly on socio-demographics and transport/urban-related variables, there is an increasing need for advanced methods to merge information provided by multiple urban/transport household surveys. In this paper, we propose a hierarchical algorithm based on a Hidden Markov Model (HMM) and an Iterative Proportional Fitting (IPF) procedure to obtain quasi-perfect marginal distributions and accurate multi-variate joint distributions. The model allows for the combination of an unlimited number of datasets. The model is validated on the basis of a synthetic dataset with 1,000,000 observations and 8 categorical variables. The results reveal that the hierarchical model is particularly robust as the deviation between the simulated and observed multivariate joint distributions is extremely small and constant, regardless of the sampling rates and the composition of the datasets in terms of variables included in those datasets. Besides, the presented methodological framework allows for an intelligent merging of multiple data sources. Furthermore, heterogeneity is smoothly incorporated into micro-samples with small sampling rates subjected to potential sampling bias. These aspects are handled simultaneously to build a generalized probabilistic structure from which new observations can be inferred. A major impact in term of expert systems is that the outputs of the hierarchical model (HM) model serve as a basis for a qualitative and quantitative analyses of integrated datasets.

[1]  M. D. McKay,et al.  Creating synthetic baseline populations , 1996 .

[2]  J. Scott Spiker Cities and Complexity: Understanding Cities with Cellular Automata, Agent-Based Models, and Fractals , 2007 .

[3]  Paul Williamson,et al.  An evaluation of the combinatorial optimisation approach to the creation of synthetic microdata , 2000 .

[4]  Michel Bierlaire,et al.  Simulation based Synthesis of Population , 2013 .

[5]  Shengli Wu,et al.  Applying statistical principles to data fusion in information retrieval , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[6]  Michel Bierlaire,et al.  Simulation based Population Synthesis , 2013 .

[7]  T. Werner Activity Based Approaches To Travel Analysis , 2016 .

[8]  Mario Cools,et al.  Forecasting travel behavior using Markov Chains-based approaches , 2016 .

[9]  Davy Janssens,et al.  Annotating mobile phone location data with activity purposes using machine learning algorithms , 2013, Expert Syst. Appl..

[10]  Ahmed M. Mustafa,et al.  Investigating the impact of river floods on travel demand based on an agent-based modeling approach: The case of Liège, Belgium , 2017, Transport Policy.

[11]  Kay W. Axhausen,et al.  The Multi-Agent Transport Simulation , 2016 .

[12]  Mario Cools,et al.  Hidden Markov Model-based population synthesis , 2016 .

[13]  Johan Barthelemy,et al.  Synthetic Population Generation Without a Sample , 2013, Transp. Sci..

[14]  Henry Leung,et al.  Data fusion in intelligent transportation systems: Progress and challenges - A survey , 2011, Inf. Fusion.

[15]  J. Ferreira,et al.  Synthetic Population Generation at Disaggregated Spatial Scales for Land Use and Transportation Microsimulation , 2014 .

[16]  Frederick Mosteller,et al.  Association and Estimation in Contingency Tables , 1968 .

[17]  Davy Janssens,et al.  Characterizing activity sequences using profile Hidden Markov Models , 2015, Expert Syst. Appl..

[18]  Xiaolin Hu,et al.  Population Synthesis Based on Joint Distribution Inference Without Disaggregate Samples , 2017, J. Artif. Soc. Soc. Simul..

[19]  Riccardo Boero Cities and Complexity: Understanding Cities with Cellular Automata, Agent-Based Models, and Fractals by Michael Batty , 2006, J. Artif. Soc. Soc. Simul..

[20]  Ta Theo Arentze,et al.  Creating Synthetic Household Populations , 2007 .

[21]  Michael Batty,et al.  Cities and complexity - understanding cities with cellular automata, agent-based models, and fractals , 2007 .