A comparison of outlier detection algorithms for ITS data

In order to improve the veracity and reliability of a traffic model built, or to extract important and valuable information from collected traffic data, the technique of outlier mining has been introduced into the traffic engineering domain for detecting and analyzing the outliers in traffic data sets. Three typical outlier algorithms, respectively the statistics-based approach, the distance-based approach, and the density-based local outlier approach, are described with respect to the principle, the characteristics and the time complexity of the algorithms. A comparison among the three algorithms is made through application to intelligent transportation systems (ITS). Two traffic data sets with different dimensions have been used in our experiments carried out, one is travel time data, and the other is traffic flow data. We conducted a number of experiments to recognize outliers hidden in the data sets before building the travel time prediction model and the traffic flow foundation diagram. In addition, some artificial generated outliers are introduced into the traffic flow data to see how well the different algorithms detect them. Three strategies-based on ensemble learning, partition and average LOF have been proposed to develop a better outlier recognizer. The experimental results reveal that these methods of outlier mining are feasible and valid to detect outliers in traffic data sets, and have a good potential for use in the domain of traffic engineering. The comparison and analysis presented in this paper are expected to provide some insights to practitioners who plan to use outlier mining for ITS data.

[1]  Piotr Omenzetter,et al.  Identification of unusual events in multi-channel bridge monitoring data , 2004 .

[2]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[3]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[4]  Alexander Skabardonis,et al.  Local MAD method for probe vehicle data processing , 2007 .

[5]  Shawn Turner,et al.  Archived Intelligent Transportation System Data Quality: Preliminary Analyses of San Antonio TransGuide Data , 2000 .

[6]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[8]  Fan Ming-hui Review of Outlier Detection , 2006 .

[9]  William Perrizo,et al.  RDF: a density-based outlier detection method using vertical data representation , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[10]  Clara Pizzuti,et al.  Outlier mining in large high-dimensional data sets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  R. Kingan,et al.  Robust Regression Methods for Traffic Growth Forecasting , 2006 .

[12]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[13]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[14]  Hao Wang,et al.  Experimental Features and Characteristics of Speed Dispersion in Urban Freeway Traffic , 2007 .

[15]  Shawn Turner,et al.  Empirical Approaches to Outlier Detection in Intelligent Transportation Systems Data , 2003 .

[16]  B. L. Smith,et al.  Applying quality control to traffic condition monitoring , 2000, ITSC2000. 2000 IEEE Intelligent Transportation Systems. Proceedings (Cat. No.00TH8493).

[17]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.