In recent years, transport agencies collect more and more GPS data of probe vehicle, data mining on these immense amounts of traffic GPS data is necessary. However, since the GPS data are non-uniform and discontinuous in the network, the quality of these collected GPS data is unreliable and will be worse when there exists lots of noisy data. What's more, if the researchers lack another type of traffic data such as loop sensors' data for verification, the result of data mining will become unreliable. Therefore, we present an approach for multidimensional traffic GPS data quality analysis using data cube model. We propose data valid density, data ideality and an overall indicator to describe the quality of GPS data. The experiment results show that our approach can describe the data quality status of the network and help evaluate the reliability of traffic parameter estimations.
[1]
Dennis Shasha,et al.
AJAX: an extensible data cleaning tool
,
2000,
SIGMOD '00.
[2]
Louis Perrochon,et al.
Towards Improving Data Quality
,
1993,
CISMOD.
[3]
Jian Pei,et al.
Data Mining: Concepts and Techniques, 3rd edition
,
2006
.
[4]
Veda C. Storey,et al.
A Framework for Analysis of Data Quality Research
,
1995,
IEEE Trans. Knowl. Data Eng..
[5]
Jiawei Han,et al.
Data Mining: Concepts and Techniques
,
2000
.
[6]
S M Turner,et al.
ITS DATA QUALITY CONTROL AND THE CALCULATION OF MOBILITY PERFORMANCE MEASURES
,
2000
.
[7]
Stuart E. Madnick,et al.
Data quality requirements analysis and modeling
,
2011,
Proceedings of IEEE 9th International Conference on Data Engineering.