A new type of distance metric and its use for clustering

In order to address high dimensional problems, a new ‘direction-aware’ metric is introduced in this paper. This new distance is a combination of two components: (1) the traditional Euclidean distance and (2) an angular/directional divergence, derived from the cosine similarity. The newly introduced metric combines the advantages of the Euclidean metric and cosine similarity, and is defined over the Euclidean space domain. Thus, it is able to take the advantage from both spaces, while preserving the Euclidean space domain. The direction-aware distance has wide range of applicability and can be used as an alternative distance measure for various traditional clustering approaches to enhance their ability of handling high dimensional problems. A new evolving clustering algorithm using the proposed distance is also proposed in this paper. Numerical examples with benchmark datasets reveal that the direction-aware distance can effectively improve the clustering quality of the k-means algorithm for high dimensional problems and demonstrate the proposed evolving clustering algorithm to be an effective tool for high dimensional data streams processing.

[1]  Plamen P. Angelov,et al.  Evolving local means method for clustering of streaming data , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[2]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[3]  B. McCune,et al.  Analysis of Ecological Communities , 2002 .

[4]  Driss Aboutajdine,et al.  Document clustering based on diffusion maps and a comparison of the k-means performances in various spaces , 2008, 2008 IEEE Symposium on Computers and Communications.

[5]  Plamen P. Angelov,et al.  An approach to automatic real‐time novelty detection, object identification, and tracking in video streams based on recursive density estimation and evolving Takagi–Sugeno fuzzy systems , 2011, Int. J. Intell. Syst..

[6]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[7]  HcellRuRuaHcellRuRu GcellRuRuaGcellRuRu FcellRuRuaFcellRuRuHcellRuRua CcellRuRuaBcellRuRuaRuRuaV,et al.  Nearest Neighbors , 2008, Encyclopedia of GIS.

[8]  Wei Xiong,et al.  On Solving Some Paradoxes Using the Ordered Weighted Averaging Operator Based Decision Model , 2014, Int. J. Intell. Syst..

[9]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Vidya Setlur,et al.  A Linguistic Approach to Categorical Color Assignment for Data Visualization , 2016, IEEE Transactions on Visualization and Computer Graphics.

[11]  Xiaowei Gu,et al.  Empirical Data Analytics , 2017, Int. J. Intell. Syst..

[12]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[13]  Claudia-Adina Dragos,et al.  Online identification of evolving Takagi-Sugeno-Kang fuzzy models for crane systems , 2014, Appl. Soft Comput..

[14]  Alladi Sitaram,et al.  Uncertainty principles and fourier analysis , 1999 .

[15]  D. Callebaut,et al.  Generalization of the Cauchy-Schwarz inequality , 1965 .

[16]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[17]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[18]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[19]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[20]  Paramasivan Saratchandran,et al.  Sequential Adaptive Fuzzy Inference System (SAFIS) for nonlinear system identification and prediction , 2006, Fuzzy Sets Syst..

[21]  Saiful Islam,et al.  Mahalanobis Distance , 2009, Encyclopedia of Biometrics.

[22]  Mahardhika Pratama,et al.  Generalized smart evolving fuzzy systems , 2015, Evol. Syst..

[23]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[24]  Stephen L. Chiu,et al.  Fuzzy Model Identification Based on Cluster Estimation , 1994, J. Intell. Fuzzy Syst..

[25]  N. Sundararajan,et al.  Extended sequential adaptive fuzzy inference system for classification problems , 2011, Evol. Syst..

[26]  Pasi Fränti,et al.  Probabilistic clustering by random swap algorithm , 2008, 2008 19th International Conference on Pattern Recognition.

[27]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[28]  James R. Glass,et al.  Cosine Similarity Scoring without Score Normalization Techniques , 2010, Odyssey.

[29]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[30]  Surajit Ray,et al.  A Nonparametric Statistical Approach to Clustering via Mode Identification , 2007, J. Mach. Learn. Res..

[31]  Themos Stafylakis,et al.  Efficient iterative mean shift based cosine dissimilarity for multi-recording speaker clustering , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.