Visual data quality analysis for taxi GPS data

We present a novel visual analysis method to systematically discover data quality problems in raw taxi GPS data. It combines semi-supervised active learning and interactive visual exploration. It helps analysts interactively discover unknown data quality problems, and automatically extract known problems. We report analysis results on Beijing taxi GPS data.

[1]  Ahmed Eldawy,et al.  NADEEF: a commodity data cleaning system , 2013, SIGMOD '13.

[2]  Yizhou Yu,et al.  Anomaly detection in GPS data based on visual analytics , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[3]  Renée J. Miller,et al.  Discovering data quality rules , 2008, Proc. VLDB Endow..

[4]  Doheon Lee,et al.  A Taxonomy of Dirty Data , 2004, Data Mining and Knowledge Discovery.