论文信息 - Evaluating performance of agglomerative clustering for extended NMF

Evaluating performance of agglomerative clustering for extended NMF

Abstract World is full of data. Data is ubiquitous and massive. This data is in its raw form and is mined to extract advantageous information. Data is coming from various industries and may contain very sensitive data also. So, securing this data before data mining is very important such that it does not reveal any sensitive information and still extract useful information. Data privacy is attained by data perturbation using the Extended NMF method. The perturbed data is then used to extract the knowledge by Data Mining. Mining can be done by clustering which organizes objects into groups of members which have some similarity amongst them. These groups are called clusters which contain objects which are similar to each other but would be dissimilar to other objects from different clusters. Hierarchical Clustering is a method which organizes the data into a tree of clusters. This tree is called Dendrogram and can be formed in bottom-up (agglomerative) manner or top-down (divisive) manner. Distance between the clusters plays a very important role in the agglomerative method. Distance measure can be Euclidean and Manhattan. Based on the distance, various types of agglomerative methods exist: complete-linkage, average-linkage, single-linkage and Ward’s- linkage. In this paper, the authors have compared different types of agglomerative methods based on various criteria and identified the best method amongst these with Extended NMF.

Payal Pahwa | Neetika Bhandari

[1] Puneet Jai Kaur,et al. Cluster quality based performance evaluation of hierarchical clustering method , 2015, 2015 1st International Conference on Next Generation Computing Technologies (NGCT).

[2] Rinkle Rani,et al. General correlation coefficient based agglomerative clustering , 2018, Cluster Computing.

[3] Qishan Zhang,et al. A privacy preserving clustering technique using hybrid data transformation method , 2009, 2009 IEEE International Conference on Grey Systems and Intelligent Services (GSIS 2009).

[4] Deepshikha Bhargava,et al. An illustration to secured way of data mining using privacy preserving data mining , 2017 .

[5] Jie Wang,et al. NNMF-Based Factorization Techniques for High-Accuracy Privacy Protection on Non-negative-valued Datasets , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[6] Shusaku Tsumoto,et al. Comparison of clustering methods for clinical databases , 2004, Inf. Sci..

[7] Renaud Gaujoux,et al. A flexible R package for nonnegative matrix factorization , 2010, BMC Bioinformatics.

[8] G. Stefansson,et al. Robustness of fish assemblages derived from three hierarchical agglomerative clustering algorithms performed on Icelandic groundfish survey data , 2011 .

[9] António E. Ruano,et al. Comparison of different methods of measuring similarity in physiologic time series , 2017 .