Evaluating performance of agglomerative clustering for extended NMF

Abstract World is full of data. Data is ubiquitous and massive. This data is in its raw form and is mined to extract advantageous information. Data is coming from various industries and may contain very sensitive data also. So, securing this data before data mining is very important such that it does not reveal any sensitive information and still extract useful information. Data privacy is attained by data perturbation using the Extended NMF method. The perturbed data is then used to extract the knowledge by Data Mining. Mining can be done by clustering which organizes objects into groups of members which have some similarity amongst them. These groups are called clusters which contain objects which are similar to each other but would be dissimilar to other objects from different clusters. Hierarchical Clustering is a method which organizes the data into a tree of clusters. This tree is called Dendrogram and can be formed in bottom-up (agglomerative) manner or top-down (divisive) manner. Distance between the clusters plays a very important role in the agglomerative method. Distance measure can be Euclidean and Manhattan. Based on the distance, various types of agglomerative methods exist: complete-linkage, average-linkage, single-linkage and Ward’s- linkage. In this paper, the authors have compared different types of agglomerative methods based on various criteria and identified the best method amongst these with Extended NMF.