Self-Organised direction aware data partitioning algorithm

In this paper, a novel fully data-driven algorithm, named Self-Organised Direction Aware (SODA) data partitioning and forming data clouds is proposed. The proposed SODA algorithm employs an extra cosine similarity-based directional component to work together with a traditional distance metric, thus, takes the advantages of both the spatial and angular divergences. Using the nonparametric Empirical Data Analytics (EDA) operators, the proposed algorithm automatically identifies the main modes of the data pattern from the empirically observed data samples and uses them as focal points to form data clouds. A streaming data processing extension of the SODA algorithm is also proposed. This extension of the SODA algorithm is able to self-adjust the data clouds structure and parameters to follow the possibly changing data patterns and processes. Numerical examples provided as a proof of the concept illustrate the proposed algorithm as an autonomous algorithm and demonstrate its high clustering performance and computational efficiency.

[1]  M. Buscema,et al.  A new meta-classifier , 2010, 2010 Annual Meeting of the North American Fuzzy Information Processing Society.

[2]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[3]  Driss Aboutajdine,et al.  Document clustering based on diffusion maps and a comparison of the k-means performances in various spaces , 2008, 2008 IEEE Symposium on Computers and Communications.

[4]  Pasi Fränti,et al.  Iterative shrinking method for clustering problems , 2006, Pattern Recognit..

[5]  Luis M. Candanedo,et al.  Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models , 2016 .

[6]  Alladi Sitaram,et al.  Uncertainty principles and fourier analysis , 1999 .

[7]  Plamen P. Angelov,et al.  Evolving local means method for clustering of streaming data , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[8]  Simon C. Potter,et al.  A Genome-Wide Association Search for Type 2 Diabetes Genes in African Americans , 2012, PLoS ONE.

[9]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[10]  Plamen Angelov,et al.  Fully online clustering of evolving data streams into arbitrarily shaped clusters , 2017, Inf. Sci..

[11]  Xiaowei Gu,et al.  Empirical Data Analytics , 2017, Int. J. Intell. Syst..

[12]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[13]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Vedran Podobnik,et al.  Bio-inspired Clustering and Data Diffusion in Machine Social Networks , 2012, Computational Social Networks.

[16]  Amit G. Mathur Data mining of aviation data for advancing health management , 2002, SPIE Defense + Commercial Sensing.

[17]  Plamen Angelov,et al.  Autonomous Learning Systems: From Data Streams to Knowledge in Real-time , 2013 .

[18]  Limin Wang,et al.  Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice , 2014, Comput. Vis. Image Underst..

[19]  Stephen L. Chiu,et al.  Fuzzy Model Identification Based on Cluster Estimation , 1994, J. Intell. Fuzzy Syst..

[20]  Plamen Angelov,et al.  Outside the box: an alternative data analytics framework , 2014, J. Autom. Mob. Robotics Intell. Syst..

[21]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[22]  Riichiro Saito,et al.  Raman spectroscopy of carbon nanotubes , 2005 .

[23]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[24]  Plamen P. Angelov,et al.  A new type of simplified fuzzy rule-based system , 2012, Int. J. Gen. Syst..

[25]  Surajit Ray,et al.  A Nonparametric Statistical Approach to Clustering via Mode Identification , 2007, J. Mach. Learn. Res..

[26]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[27]  J. G. Saw,et al.  Chebyshev Inequality With Estimated Mean and Variance , 1984 .

[28]  Raúl Mohedano,et al.  On the Mahalanobis Distance Classification Criterion for Multidimensional Normal Distributions , 2013, IEEE Transactions on Signal Processing.

[29]  Pasi Fränti,et al.  Fast Agglomerative Clustering Using a k-Nearest Neighbor Graph , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  P. Viswanath,et al.  Rough-DBSCAN: A fast hybrid density based clustering method for large data sets , 2009, Pattern Recognit. Lett..

[31]  Tiziana di Matteo,et al.  Hierarchical Information Clustering by Means of Topologically Embedded Graphs , 2011, PloS one.

[32]  Saiful Islam,et al.  Mahalanobis Distance , 2009, Encyclopedia of Biometrics.

[33]  Chang-Dong Wang,et al.  SVStream: A Support Vector-Based Algorithm for Clustering Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[34]  Dejan Dovzan,et al.  Recursive clustering based on a Gustafson–Kessel algorithm , 2011, Evol. Syst..

[35]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Domingo Mery,et al.  Face Recognition Using Sparse Fingerprint Classification Algorithm , 2017, IEEE Transactions on Information Forensics and Security.

[37]  Pasi Fränti,et al.  Gradual model generator for single-pass clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[38]  Pasi Fränti,et al.  Probabilistic clustering by random swap algorithm , 2008, 2008 19th International Conference on Pattern Recognition.

[39]  Plamen P. Angelov,et al.  A Generalized Methodology for Data Analysis , 2018, IEEE Transactions on Cybernetics.

[40]  Themos Stafylakis,et al.  Efficient iterative mean shift based cosine dissimilarity for multi-recording speaker clustering , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[41]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Vidya Setlur,et al.  A Linguistic Approach to Categorical Color Assignment for Data Visualization , 2016, IEEE Transactions on Visualization and Computer Graphics.