A huge amount of sensing data is generated by a large number of pervasive IoT devices. In order to find a meaningful information from the big data, pre-processing is essential, in which many outlier data need to be removed because those are deteriorated as time passes. In this paper, big data pre-processing methods are investigated and proposed. To evaluate the pre-processing methods for accurate analysis, we use collection of digital tachograph (DTG) data. We obtained DTG sensing data of six-thousand driving vehicles over a year. We studied five kinds of pre-processing methods: filtering ranges, excluding meaningless values, comparing filters from variables, applying statistical techniques, and finding driving patterns. In addition, we developed MapReduce programming using a Hadoop ecosystem, and deployed a big data to perform pre-processing analysis. Out of the pre-processing steps, we confirmed the proportion of DTG sensing data including any errors is up to 27.09 %. In addition, we approved that outlier data can be well detected, which is difficult to detect through simple range error pre-processing.
[1]
Jiawei Han,et al.
Data Mining: Concepts and Techniques
,
2000
.
[2]
Antonio Iera,et al.
The Internet of Things: A survey
,
2010,
Comput. Networks.
[3]
Carlos Soares,et al.
Estimating Fuel Consumption from GPS Data
,
2015,
IbPRIA.
[4]
Tom White,et al.
Hadoop: The Definitive Guide
,
2009
.
[5]
Seok-June Lee,et al.
Short-Term Impact Analysis of DTG Installation for Commercial Vehicles
,
2012
.
[6]
Eunmi Choi,et al.
A GPS Trajectory Map-Matching Mechanism with DTG Big Data on the HBase System
,
2015,
BigDAS.