论文信息 - Effective Pre-processing Methods with DTG Big Data by Using MapReduce Techniques

Effective Pre-processing Methods with DTG Big Data by Using MapReduce Techniques

A huge amount of sensing data is generated by a large number of pervasive IoT devices. In order to find a meaningful information from the big data, pre-processing is essential, in which many outlier data need to be removed because those are deteriorated as time passes. In this paper, big data pre-processing methods are investigated and proposed. To evaluate the pre-processing methods for accurate analysis, we use collection of digital tachograph (DTG) data. We obtained DTG sensing data of six-thousand driving vehicles over a year. We studied five kinds of pre-processing methods: filtering ranges, excluding meaningless values, comparing filters from variables, applying statistical techniques, and finding driving patterns. In addition, we developed MapReduce programming using a Hadoop ecosystem, and deployed a big data to perform pre-processing analysis. Out of the pre-processing steps, we confirmed the proportion of DTG sensing data including any errors is up to 27.09 %. In addition, we approved that outlier data can be well detected, which is difficult to detect through simple range error pre-processing.

Eunmi Choi | Wonhee Cho

[1] Jiawei Han,et al. Data Mining: Concepts and Techniques , 2000 .

[2] Antonio Iera,et al. The Internet of Things: A survey , 2010, Comput. Networks.

[3] Carlos Soares,et al. Estimating Fuel Consumption from GPS Data , 2015, IbPRIA.

[4] Tom White,et al. Hadoop: The Definitive Guide , 2009 .

[5] Seok-June Lee,et al. Short-Term Impact Analysis of DTG Installation for Commercial Vehicles , 2012 .

[6] Eunmi Choi,et al. A GPS Trajectory Map-Matching Mechanism with DTG Big Data on the HBase System , 2015, BigDAS.