A big data analytics framework for border crossing transportation

In this paper, the authors present a framework on developing a comprehensive system to analyse border crossing transportation using an open-source meta-data acquisition and aggregation tool. It is a platform integration approach based on Hadoop, MapReduce and MongoDB to consolidate databases from both the USA and Mexico. We design data-driven XML schema for tagging the data entries from different sources with different formats, and implement a package using open-source software R to aggregate XML-transformed data into time and space dimensions. Then the transformed data is analysed by a difference-in-difference (DiD) estimation model to understand the behaviour of border crossing transportation.