A new evolving clustering algorithm for online data streams

In this paper, we propose a new approach to fuzzy data clustering. We present a new algorithm, called TEDA-Cloud, based on the recently introduced TEDA approach to outlier detection. TEDA-Cloud is a statistical method based on the concepts of typicality and eccentricity able to group similar data observations. Instead of the traditional concept of clusters, the data is grouped in the form of granular unities called data clouds, which are structures with no pre-defined shape or set boundaries. TEDA-Cloud is a fully autonomous and self-evolving algorithm that can be used for data clustering of online data streams and applications that require real-time response. Since it is fully autonomous, TEDA-Cloud is able to “start from scratch” (from an empty knowledge basis), create, update and merge data clouds, in a fully autonomous manner, without requiring any user-defined parameters (e.g. number of clusters, size, radius) or previous training. Moreover, TEDA-Cloud, unlike most of the traditional statistical approaches, does not rely on a specific data distribution or on the assumption of independence of data samples. The results, obtained from multiple data sets that are very well known in literature, are very encouraging.

[1]  Andrea Bernieri,et al.  On-line fault detection and diagnosis obtained by implementing neural algorithms on a digital signal processor , 1996 .

[2]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[3]  Plamen Angelov,et al.  Anomaly detection based on eccentricity analysis , 2014, 2014 IEEE Symposium on Evolving and Autonomous Learning Systems (EALS).

[4]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[5]  Chengke Zhou,et al.  Application of K-Means method to pattern recognition in on-line cable partial discharge monitoring , 2013, IEEE Transactions on Dielectrics and Electrical Insulation.

[6]  Plamen P. Angelov,et al.  Fully unsupervised fault detection and identification based on recursive density estimation and self-evolving cloud-based classifier , 2015, Neurocomputing.

[7]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[8]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[9]  Shailendra Kumar Shrivastava,et al.  Clustering of Image Data Set Using K-Means and Fuzzy K-Means Algorithms , 2010, 2010 International Conference on Computational Intelligence and Communication Networks.

[10]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[11]  Plamen P. Angelov,et al.  A new unsupervised approach to fault detection and identification , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[12]  Dongsheng Wu,et al.  Fault Diagnosis Based on K-Means Clustering and PNN , 2010, 2010 Third International Conference on Intelligent Networks and Intelligent Systems.

[13]  Marimuthu Palaniswami,et al.  Fuzzy c-Means Algorithms for Very Large Data , 2012, IEEE Transactions on Fuzzy Systems.

[14]  Fernando Boto,et al.  Multidimensional multistage k-NN classifiers for handwritten digit recognition , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[15]  J. G. Saw,et al.  Chebyshev Inequality With Estimated Mean and Variance , 1984 .

[16]  Plamen P. Angelov,et al.  A new type of simplified fuzzy rule-based system , 2012, Int. J. Gen. Syst..

[17]  A. K. Junoh,et al.  Home security system based on Fuzzy k-NN Classifier , 2012, 2012 International Symposium on Instrumentation & Measurement, Sensor Network and Automation (IMSNA).

[18]  Plamen P. Angelov,et al.  Online fault detection based on Typicality and Eccentricity Data Analytics , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[19]  Plamen P. Angelov,et al.  Dynamically evolving fuzzy classifier for real-time classification of data streams , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[20]  Pasi Fränti,et al.  Iterative shrinking method for clustering problems , 2006, Pattern Recognit..

[21]  Qingshan Deng,et al.  Combining self-organizing map and K-means clustering for detecting fraudulent financial statements , 2009, 2009 IEEE International Conference on Granular Computing.

[22]  Liliana Lopez-Kleine,et al.  Identification and analysis of gene clusters in biological data , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[23]  Plamen Angelov Autonomous Learning Systems:From Data to Knowledge in Real Time , 2012 .

[24]  Pasi Fränti,et al.  A Dynamic local search algorithm for the clustering problem , 2002 .