Statistical Approaches to Detect Anomalies

The term anomaly is derived from a Greek word anomolia meaning uneven or irregular. Anomalies are often referred to as outliers in statistical terminology. For a given set of data if we plot a graph and observe, all the data points that are relative to each other will be plotted densely, whereas some data points which are irrelevant to the data set will be lied away from the rest of the points. We call those points as outliers or anomalies. Anomaly detection is also called as deviation detection, because outlying objects have attribute values that are significantly different from expected or typical attribute values. The anomaly detection is also called as exception mining because anomalies are exceptional in some sense. Anomalous data object is unusual, irregular or in some way, inconsistent with other data objects. In this case, unusual data object or irregular patterns need not be termed as not occurring frequently. If we take a large data set or a continuous data stream, then an unusual data object, that occurs ‘one in a thousand’ times, can occur millions of times in billions of events considered. To find out the anomalies in data sets, we have many approaches like statistical, proximity–based, density–based and cluster–based. Statistical approaches are model-based approaches where a model is created for the data and objects are calculated with respect to how they are relative with all other objects. In this paper, we will be discussing various statistical approaches to detect anomalies. Most statistical approaches to outlier detection are based on developing a probability distribution model and considering how probable objects are under that model.

[1]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[2]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[3]  Seiichi Uchida,et al.  A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data , 2016, PloS one.

[4]  Sanjay Ranka,et al.  Conditional Anomaly Detection , 2007, IEEE Transactions on Knowledge and Data Engineering.