In this paper, we propose a new eccentricity- based anomaly detection principle and algorithm. It is based on a further development of the recently introduced data analytics framework (TEDA - from typicality and eccentricity data analytics). We compare TEDA with the traditional statistical approach and prove that TEDA is a generalization of it in regards to the well-known “nσ” analysis (TEDA gives exactly the same result as the traditional “nσ” analysis but it does not require the restrictive prior assumptions that are made for the traditional approach to be in place). Moreover, it offers a non-parametric, closed form analytical descriptions (models of the data distribution) to be extracted from the real data realizations, not to be pre-assumed. In addition to that, for several types of proximity/similarity measures (such as Euclidean, cosine, Mahalonobis) it can be calculated recursively, thus, computationally very efficiently and is suitable for real time and online algorithms. Building on the per data sample, exact information about the data distribution in a closed analytical form, in this paper we propose a new less conservative and more sensitive condition for anomaly detection. It is quite different from the traditional “nσ” type conditions. We demonstrate example where traditional conditions would lead to an increased amount of false negatives or false positives in comparison with the proposed condition. The new condition is intuitive and easy to check for arbitrary data distribution and arbitrary small (but not less than 3) amount of data samples/points. Finally, because the anomaly/novelty/change detection is very important and basic data analysis operation which is in the fundament of such higher level tasks as fault detection, drift detection in data streams, clustering, outliers detection, autonomous video analytics, particle physics, etc. we point to some possible applications which will be the domain of future work.
[1]
Jose C. Principe,et al.
Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives
,
2010,
Information Theoretic Learning.
[2]
Andrea Bernieri,et al.
On-line fault detection and diagnosis obtained by implementing neural algorithms on a digital signal processor
,
1996
.
[3]
Plamen Angelov,et al.
Data density based clustering
,
2014,
2014 14th UK Workshop on Computational Intelligence (UKCI).
[4]
David G. Stork,et al.
Pattern classification, 2nd Edition
,
2000
.
[5]
Edward E. Smith,et al.
On typicality and vagueness
,
1997,
Cognition.
[6]
J. G. Saw,et al.
Chebyshev Inequality With Estimated Mean and Variance
,
1984
.
[7]
Plamen P. Angelov,et al.
ARFA: Automated real-time flight data analysis using evolving clustering, classifiers and recursive density estimation
,
2013,
2013 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS).
[8]
Plamen Angelov.
Autonomous Learning Systems:From Data to Knowledge in Real Time
,
2012
.
[9]
Gerhard Widmer,et al.
Learning in the presence of concept drift and hidden contexts
,
2004,
Machine Learning.
[10]
Plamen Angelov,et al.
Outside the box: an alternative data analytics framework
,
2014,
J. Autom. Mob. Robotics Intell. Syst..
[11]
David G. Stork,et al.
Pattern Classification
,
1973
.
[12]
João Gama,et al.
A survey on concept drift adaptation
,
2014,
ACM Comput. Surv..
[13]
Plamen Angelov.
Machine learning (collaborative systems)
,
2006
.
[14]
Plamen P. Angelov,et al.
A new type of simplified fuzzy rule-based system
,
2012,
Int. J. Gen. Syst..
[15]
Branko Ristic,et al.
Beyond the Kalman Filter: Particle Filters for Tracking Applications
,
2004
.
[16]
VARUN CHANDOLA,et al.
Anomaly detection: A survey
,
2009,
CSUR.
[17]
R. Redner,et al.
Mixture densities, maximum likelihood, and the EM algorithm
,
1984
.