Effective Pre-processing Methods with DTG Big Data by Using MapReduce Techniques

A huge amount of sensing data is generated by a large number of pervasive IoT devices. In order to find a meaningful information from the big data, pre-processing is essential, in which many outlier data need to be removed because those are deteriorated as time passes. In this paper, big data pre-processing methods are investigated and proposed. To evaluate the pre-processing methods for accurate analysis, we use collection of digital tachograph (DTG) data. We obtained DTG sensing data of six-thousand driving vehicles over a year. We studied five kinds of pre-processing methods: filtering ranges, excluding meaningless values, comparing filters from variables, applying statistical techniques, and finding driving patterns. In addition, we developed MapReduce programming using a Hadoop ecosystem, and deployed a big data to perform pre-processing analysis. Out of the pre-processing steps, we confirmed the proportion of DTG sensing data including any errors is up to 27.09 %. In addition, we approved that outlier data can be well detected, which is difficult to detect through simple range error pre-processing.