Copula Based Population Synthesis and Big Data Driven Performance Measurement

Title of dissertation: COPULA BASED POPULATION SYNTHESIS AND BIG DATA DRIVEN PERFORMANCE MEASUREMENT Kartik Kaushik, Doctor of Philosophy, 2019 Dissertation directed by: Professor Cinzia Cirillo Department of Civil Engineering Transportation agencies all over the country are facing fiscal shortages due to the increasing costs of management and maintenance of facilities. The political reluctance to increase gas taxes, the primary source of revenue for many government transportation agencies, along with the improving fuel efficiency of automobiles sold to consumers, only exacerbate the financial dire straits. The adoption of electric vehicles threatens to completely stop the inflow of money into federal, state and regional agencies. Consequently, expansion of the network and infrastructure is slowly being replaced by a more proactive approach to managing the use of existing facilities. The required insights to manage the network more efficiently is also partly due to a massive increase in the type and volume of available data. These data are paving the way for network-wide Intelligent Transportation Systems (ITS), which promises to maximize utilization of current facilities. The waves of revolutions overtaking the usual business affairs of transportation agencies have prompted the development and application of various analytical tools, models and and procedures to transportation. Contributions to this growth of analysis techniques are documented in this dissertation. There are two main domains of transportation: demand and supply, which need to be simultaneously managed to effectively push towards optimal use of resources, facilities, and to minimize negative impacts like time wasted in delays, environmental pollution, and greenhouse gas emissions. The two domains are quite distinct and require specialized solutions to the problems. This dissertation documents the developed techniques in two sections, addressing the two domains of demand and supply. In the first section, a copula based approach is demonstrated to produce a reliable and accurate synthetic population which is essential to estimate the demand correctly. The second section deals with big data analytics using simple models and fast algorithms to produce results in real-time. The techniques developed target short-term traffic forecasting, linking of multiple disparate datasets to power niche analytics, and quickly computing accurate measures of highway network performance to inform decisions made by facility operators in real-time. The analyses presented in this dissertation target many core aspects of transportation science, and enable the shared goal of providing safe, efficient and equitable service to travelers. Synthetic population in transportation is used primarily to estimate transportation demand from Activity Based Modeling (ABM) framework containing well-fitted behavioral and choice models. It allows accurate verification of the impacts of policies on the travel behavior of people, enabling confident implementation of policies, like setting transit fares or tolls, designed for the common benefit of many. Further accurate demand models allow for resilient and resourceful planning of new or repurposing existing infrastructure and assets. On the other hand, short-term traffic speed predictions and speed based reliable performance measures are key in providing advanced ITS, like real-time route guidance, traveler awareness, and others, geared towards minimizing time, energy and resource wastage, and maximizing user satisfaction. Merging of datasets allow transfer of data such as traffic volumes and speeds between them, allowing computation of the global and network-wide impacts and externalities of transportation, like greenhouse gas emissions, time, energy and resources consumed and wasted in traffic jams, etc. COPULA BASED POPULATION SYNTHESIS AND BIG DATA DRIVEN PERFORMANCE MEASUREMENT

[1]  Francisco Javier Ariza-López,et al.  Digital map conflation: a review of the process and a proposal for classification , 2011, Int. J. Geogr. Inf. Sci..

[2]  Christian Genest,et al.  On the empirical multilinear copula process for count data , 2014, 1407.1200.

[3]  M. Haklay How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets , 2010 .

[4]  Ali Haghani,et al.  Analysis of Vehicle Detection Rate for Bluetooth Traffic Sensors: A Case Study in Maryland and Delaware , 2011 .

[5]  Alan Saalfeld,et al.  Conflation Automated map compilation , 1988, Int. J. Geogr. Inf. Sci..

[6]  Bruno Simeone,et al.  ON THE ITERATIVE PROPORTIONAL FITTING PROCEDURE : STRUCTURE OF ACCUMULATION POINTS AND L 1-ERROR ANALYSIS , 2009 .

[7]  Qingquan Li,et al.  Map-matching algorithm for large-scale low-frequency floating car data , 2014, Int. J. Geogr. Inf. Sci..

[8]  Ivan Kojadinovic,et al.  Some copula inference procedures adapted to the presence of ties , 2016, Comput. Stat. Data Anal..

[9]  R. Nelsen An Introduction to Copulas , 1998 .

[10]  P C Vythoulkas,et al.  ALTERNATIVE APPROACHES TO SHORT TERM TRAFFIC FORECASTING FOR USE IN DRIVER INFORMATION SYSTEMS , 1993 .

[11]  Moshe Levin,et al.  ON FORECASTING FREEWAY OCCUPANCIES AND VOLUMES (ABRIDGMENT) , 1980 .

[12]  Soe-tsyr Yuan,et al.  Development of Conflation Components , 1999 .

[13]  S Openshaw,et al.  Algorithms for Reengineering 1991 Census Geography , 1995, Environment & planning A.

[14]  Eric J. Miller,et al.  ILUTE: An Operational Prototype of a Comprehensive Microsimulation Model of Urban Systems , 2005 .

[15]  Billy M. Williams,et al.  Modeling and Forecasting Vehicular Traffic Flow as a Seasonal ARIMA Process: Theoretical Basis and Empirical Results , 2003, Journal of Transportation Engineering.

[16]  S. Spraggs,et al.  Traffic Engineering , 2000 .

[17]  Budhendra L. Bhaduri,et al.  Dependence-Preserving Approach to Synthesizing Household Characteristics , 2012 .

[18]  Hashem R Al-Masaeid,et al.  Short-Term Prediction of Traffic Volume in Urban Arterials , 1995 .

[19]  M. Smith Bayesian Approaches to Copula Modelling , 2011, 1112.4204.

[20]  Eleni I. Vlahogianni,et al.  Short-term traffic forecasting: Where we are and where we’re going , 2014 .

[21]  Billy M. Williams,et al.  Urban Freeway Traffic Flow Prediction: Application of Seasonal Autoregressive Integrated Moving Average and Exponential Smoothing Models , 1998 .

[22]  A. R. Cook,et al.  ANALYSIS OF FREEWAY TRAFFIC TIME-SERIES DATA BY USING BOX-JENKINS TECHNIQUES , 1979 .

[23]  L. Rivest,et al.  Unit level small area estimation with copulas , 2016 .

[24]  Elliott Irving Organick A Fortran IV Primer , 1966 .

[25]  Christopher B. Jones,et al.  Matching and aligning features in overlayed coverages , 1998, GIS '98.

[26]  D. Schrank,et al.  2012 Urban Mobility Report , 2002 .

[27]  Yanru Zhang UNCERTAINTY ASSOCIATED WITH TRAVEL TIME PREDICTION: ADVANCED VOLATILITY APPROACHES AND ENSEMBLE METHODS , 2015 .

[28]  Sherif Ishak,et al.  Performance evaluation of short-term time-series traffic prediction model , 2002 .

[29]  Jean-Marie Dufour,et al.  A regularized goodness-of-fit test for copulas , 2013 .

[30]  I. Olkin,et al.  Families of Multivariate Distributions , 1988 .

[31]  Bisheng Yang,et al.  A probabilistic relaxation approach for matching road networks , 2013, Int. J. Geogr. Inf. Sci..

[32]  T. Galili Modelling Dependence with Copulas in R , 2015 .

[33]  Bin Ran,et al.  Online Recursive Algorithm for Short-Term Traffic Prediction , 2004 .

[34]  H. Joe Multivariate Models and Multivariate Dependence Concepts , 1997 .

[35]  J. W. C. van Lint,et al.  Online Learning Solutions for Freeway Travel Time Prediction , 2008, IEEE Transactions on Intelligent Transportation Systems.

[36]  Kay W. Axhausen,et al.  Population synthesis for microsimulation: State of the art , 2010 .

[37]  F. Pesarin Multivariate Permutation Tests : With Applications in Biostatistics , 2001 .

[38]  Deok-Soo Kim,et al.  Copula-Based Approach to Synthetic Population Generation , 2016, PloS one.

[39]  Alexandre Torday Simulation-based Decision Support System for Real Time Traffic Management , 2010 .

[40]  Siem Jan Koopman,et al.  Intraday Stock Price Dependence Using Dynamic Discrete Copula Distributions , 2015 .

[41]  D. Schrank,et al.  2015 Urban Mobility Scorecard , 2015 .

[42]  M. D. McKay,et al.  Creating synthetic baseline populations , 1996 .

[43]  Paul Williamson,et al.  An evaluation of the combinatorial optimisation approach to the creation of synthetic microdata , 2000 .

[44]  Markos Papageorgiou,et al.  Real-time freeway traffic state estimation based on extended Kalman filter: Adaptive capabilities and real data testing , 2008 .

[45]  Li Li,et al.  Robust causal dependence mining in big data network and its application to traffic flow predictions , 2015 .

[46]  Pascal Neis,et al.  The Street Network Evolution of Crowdsourced Maps: OpenStreetMap in Germany 2007-2011 , 2011, Future Internet.

[47]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[48]  Liao Chen-Fu Using Truck GPS Data for Freight Performance Analysis in the Twin Cities Metro Area , 2014 .

[49]  Alexander Skabardonis,et al.  Freeway Performance Measurement System: Operational Analysis Tool , 2002 .

[50]  Johan Barthelemy,et al.  Synthetic Population Generation Without a Sample , 2013, Transp. Sci..

[51]  Joe Whittaker,et al.  TRACKING AND PREDICTING A NETWORK TRAFFIC PROCESS , 1997 .

[52]  Eric J. Miller,et al.  Advances in population synthesis: fitting many attributes per agent and fitting to household and person margins simultaneously , 2012 .

[53]  Hjp Harry Timmermans,et al.  A learning-based transportation oriented simulation system , 2004 .

[54]  Mark Dougherty,et al.  SHOULD WE USE NEURAL NETWORKS OR STATISTICAL MODELS FOR SHORT TERM MOTORWAY TRAFFIC FORECASTING , 1997 .

[55]  Jiming Jiang,et al.  Mixed model prediction and small area estimation , 2006 .

[56]  Baher Abdulhai,et al.  Short Term Freeway Traffic Flow Prediction Using Genetically-Optimized Time-Delay-Based Neural Networks , 1999 .

[57]  Yunlong Zhang,et al.  Special issue on short-term traffic flow forecasting , 2014 .

[58]  Jessica Y. Guo,et al.  Activity-based travel-demand analysis for metropolitan areas in Texas: CEMDAP models, framework, software architecture and application results , 2006 .

[59]  C. Genest,et al.  A Primer on Copulas for Count Data , 2007, ASTIN Bulletin.

[60]  H. Joe Asymptotic efficiency of the two-stage estimation method for copula-based models , 2005 .

[61]  Partha Lahiri,et al.  Hierarchical Bayes Modeling of Survey-Weighted Small Area Proportions , 2014 .

[62]  Jun Yan,et al.  A goodness-of-fit test for multivariate multiparameter copulas based on multiplier central limit theorems , 2011, Stat. Comput..

[63]  Billy M. Williams,et al.  Comparison of parametric and nonparametric models for traffic flow forecasting , 2002 .

[64]  H. Joe Dependence Modeling with Copulas , 2014 .

[65]  Kartik Kaushik,et al.  Computing Performance Measures with National Performance Management Research Data Set , 2015 .

[66]  Frederick E. Petry,et al.  A Rule-based Approach for the Conflation of Attributed Vector Data , 1998, GeoInformatica.

[67]  Yanru Zhang,et al.  A hybrid short-term traffic flow forecasting method based on spectral analysis and statistical volatility model , 2014 .

[68]  M. Bradley,et al.  SACSIM: An applied activity-based model system with fine-level spatial and temporal resolution , 2010 .

[69]  Christian Genest,et al.  Discussion: Statistical models and methods for dependence in insurance data , 2011 .

[70]  Gunky Kim,et al.  Comparison of semiparametric and parametric methods for estimating copulas , 2007, Comput. Stat. Data Anal..

[71]  Eric Wood,et al.  Coupled Approximation of U.S. Driving Speed and Volume Statistics using Spatial Conflation and Temporal Disaggregation , 2018 .

[72]  P. Lahiri,et al.  Variance Modeling in the U.S. Small Area Income and Poverty Estimates Program for the American Community Survey , 2010 .

[73]  Jun Yan,et al.  Modeling Multivariate Distributions with Continuous Margins Using the copula R Package , 2010 .

[74]  Eleni I. Vlahogianni,et al.  Short‐term traffic forecasting: Overview of objectives and methods , 2004 .

[75]  Alexander Zipf,et al.  A polygon-based approach for matching OpenStreetMap road networks with regional transit authority data , 2016, Int. J. Geogr. Inf. Sci..

[76]  Jean-David Fermanian,et al.  Goodness-of-fit tests for copulas , 2005 .

[77]  C. Genest,et al.  A semiparametric estimation procedure of dependence parameters in multivariate families of distributions , 1995 .

[78]  Cinzia Cirillo,et al.  On Modelling Human Population Characteristics with Copulas , 2019, ANT/EDI40.

[79]  Hazem H. Refai,et al.  National Performance Management Research Dataset (NPMRDS) - Speed Validation for Traffic Performance Measures , 2017 .

[80]  Guillaume Touya,et al.  Quality Assessment of the French OpenStreetMap Dataset , 2010, Trans. GIS.

[81]  M. Smith,et al.  Estimation of Copula Models With Discrete Margins via Bayesian Data Augmentation , 2011 .

[82]  Yanru Zhang,et al.  A gradient boosting method to improve travel time prediction , 2015 .

[83]  Daniel C Murray,et al.  Cost of Congestion to the Trucking Industry , 2014 .

[84]  Haris N. Koutsopoulos,et al.  Urban Network Travel Time Prediction Based on a Probabilistic Principal Component Analysis Model of Probe Data , 2018, IEEE Transactions on Intelligent Transportation Systems.

[85]  Lily Elefteriadou,et al.  Travel time estimation on a freeway using Discrete Time Markov Chains , 2008 .

[86]  Dimitrios I. Tselentis,et al.  Improving short-term traffic forecasts: to combine models or not to combine? , 2015 .

[87]  W. Deming,et al.  On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known , 1940 .

[88]  Michael J Demetsky,et al.  TRAFFIC FLOW FORECASTING: COMPARISON OF MODELING APPROACHES , 1997 .

[89]  B. Rémillard,et al.  Validity of the parametric bootstrap for goodness-of-fit testing in semiparametric models , 2005 .

[90]  P. Waddell,et al.  Methodology to Match Distributions of Both Household and Person Attributes in Generation of Synthetic Populations , 2009 .

[91]  Hironori Suzuki,et al.  Application of Probe-Vehicle Data for Real-Time Traffic-State Estimation and Short-Term Travel-Time Prediction on a Freeway , 2003 .

[92]  Gaetano Fusco,et al.  Short-term speed predictions exploiting big data on large urban road networks , 2016 .

[93]  C. Genest,et al.  Everything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask , 2007 .

[94]  Volker Walter,et al.  Matching spatial data sets: a statistical approach , 1999, Int. J. Geogr. Inf. Sci..

[95]  Haitham Al-Deek,et al.  Predictions of Freeway Traffic Speeds and Volumes Using Vector Autoregressive Models , 2009, J. Intell. Transp. Syst..

[96]  Cinzia Cirillo,et al.  Coupling National Performance Management Research Data Set and the Highway Performance Monitoring System Datasets on a Geospatial Level , 2019 .

[97]  Gary A. Davis,et al.  ADAPTIVE FORECASTING OF FREEWAY TRAFFIC CONGESTION , 1990 .

[98]  Jun Yan,et al.  FAST LARGE-SAMPLE GOODNESS-OF-FIT TESTS FOR COPULAS , 2011 .

[99]  David E. Boyce,et al.  Urban travel forecasting in the USA and UK , 2005 .

[100]  F. Durante,et al.  Quantification of the environmental structural risk with spoiling ties: is randomization worthwhile? , 2017, Stochastic Environmental Research and Risk Assessment.

[101]  Steven I-Jy Chien,et al.  DYNAMIC TRAVEL TIME PREDICTION WITH REAL-TIME AND HISTORICAL DATA , 2003 .

[102]  B. Rémillard,et al.  Goodness-of-fit tests for copulas: A review and a power study , 2006 .

[103]  Chandra R. Bhat,et al.  Population Synthesis for Microsimulating Travel Behavior , 2007 .

[104]  Hans van Lint,et al.  Short-Term Traffic and Travel Time Prediction Models , 2012 .

[105]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[106]  Yiannis Kamarianakis,et al.  Space-time modeling of traffic flow , 2002, Comput. Geosci..

[107]  Michel Bierlaire,et al.  Simulation based Population Synthesis , 2013 .

[108]  P H Rees,et al.  The Estimation of Population Microdata by Using Data from Small Area Statistics and Samples of Anonymised Records , 1998, Environment & planning A.

[109]  Hwasoo Yeo,et al.  Short-term Travel-time Prediction on Highway: A Review of the Data-driven Approach , 2015 .

[110]  Mahmoud Javanmardi Integration of TRANSIMS with the ADAPTS Activity-based Model , 2012 .

[111]  Ying Han,et al.  Synthetic time series technique for predicting network-wide road traffic , 2018, Statistical Journal of the IAOS.

[112]  Jun Yan,et al.  Comparison of three semiparametric methods for estimating dependence parameters in copula models , 2010 .

[113]  David R. Pritchard,et al.  Synthesizing agents and relationships for land use/transportation modelling , 2008 .

[114]  Amir Reza Mamdoohi,et al.  Population Synthesis Using Iterative Proportional Fitting (IPF): A Review and Future Research , 2016 .

[115]  Lu Ma Generating disaggregate population characteristics for input to travel-demand models , 2011 .

[116]  T. Vincenty DIRECT AND INVERSE SOLUTIONS OF GEODESICS ON THE ELLIPSOID WITH APPLICATION OF NESTED EQUATIONS , 1975 .

[117]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[118]  M. Ghosh,et al.  A Hierarchical Bayes Approach to Small Area Estimation with Auxiliary Information , 1992 .

[119]  J. Rao Small Area Estimation , 2003 .