Grid-based outlier detection in large data sets for combine harvesters

Outlier detection is one of the most widely used technique to identify abnormal behavior in raw data. The sense of abnormal deviation mentioned here accounts not only for human made or system errors that naturally occur as part of the data but also as seldomly occuring events. In this paper, we propose a new algorithm called Grid Based Outlier Detection (GBOD) to find the hidden outliers in large data sets. In contrast to existing grid based methods which are limited to only some statistical based approaches, the GBOD algorithm is raised with two alternations to figure out different range of outliers depending on the interest of the user. First, the number of points in a local grid cell is used to decide whether a point is an outlier or not. In a second step, this approach is extended to method that assigns an outlier score to each data point. The simple design makes this algorithm extremely efficient for large data sets.

[1]  Wei-keng Liao,et al.  A Grid-based Clustering Algorithm using Adaptive Mesh Refinement , 2004 .

[2]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[3]  Kwang-Ho Ro,et al.  Outlier detection for high-dimensional data , 2015 .

[4]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[5]  P. Filzmoser A MULTIVARIATE OUTLIER DETECTION METHOD , 2004 .

[6]  Fabrizio Angiulli,et al.  Outlier Detection Techniques for Data Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[7]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[8]  Eamonn J. Keogh Nearest Neighbor , 2010, Encyclopedia of Machine Learning.

[9]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[10]  Charu C. Aggarwal Probabilistic and Statistical Models for Outlier Detection , 2013 .

[11]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[12]  John F. Roddick,et al.  CURIO: A Fast Outlier and Outlier Cluster Detection Algorithm for Large Datasets , 2007, AIDM.

[13]  Markus Goldstein,et al.  Anomaly Detection in Large Datasets , 2014 .

[14]  Gu Ying,et al.  Anomaly detection in sensor data provided by combine harvesters , 2016 .

[15]  Bin Wang,et al.  Distance-Based Outlier Detection on Uncertain Data , 2009, 2009 Ninth IEEE International Conference on Computer and Information Technology.

[16]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[17]  Peng Li,et al.  Improving clustering based anomaly detection with concave hull: An application in fault diagnosis of wind turbines , 2016, 2016 IEEE 14th International Conference on Industrial Informatics (INDIN).

[18]  Theodore Johnson,et al.  Fast Computation of 2-Dimensional Depth Contours , 1998, KDD.

[19]  Charu C. Aggarwal Linear Models for Outlier Detection , 2013 .

[20]  D. M. Hawkins Multivariate outlier detection , 1980 .

[21]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .