Development of new agglomerative and performance evaluation models for classification

This study proposes two new hierarchical clustering methods, namely weighted and neighbourhood to overcome the issues such as getting less accuracy, inability to separate the clusters properly and the grouping of more number of clusters which exist in present hierarchical clustering methods. We have also proposed three new criteria to assess the performance of clustering methods: (1) overall effectiveness which means the product of overall efficiency and accuracy of the clusters which is used to evaluate the performance of the hierarchical clustering methods for the class label datasets, (2) modified structure strength S ( c ) to overcome the usage problem in hierarchical clustering methods to determine the number of clusters for non-class label datasets and (3) R -value which is the ratio of the determinant of the sum of square and cross product matrix of between-clusters to the determinant of the sum of square and cross product matrix of within-clusters. This will help us to validate the performance of hierarchical clustering methods for non-class label datasets. The evolved algorithms provided high accuracy, ability to separate the clusters properly and the grouping of less number of clusters. The performance of the new algorithms with existing algorithms is compared in terms of newly developed performance criteria. The new algorithms thus performed better than the existing algorithms. The whole exercise is done with the help of twelve class label and six non-class label datasets.

[1]  João Gama,et al.  Hierarchical Clustering of Time-Series Data Streams , 2008, IEEE Transactions on Knowledge and Data Engineering.

[2]  Nicu Sebe,et al.  Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  G. Ritter Robust Cluster Analysis and Variable Selection , 2014 .

[4]  Hisashi Koga,et al.  Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing , 2007, Knowledge and Information Systems.

[5]  Seaar Al-Dabooni,et al.  Model Order Reduction Based on Agglomerative Hierarchical Clustering , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Zahra Nazari,et al.  A new hierarchical clustering algorithm , 2015, 2015 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS).

[7]  Jiancong Fan,et al.  OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm , 2015, Neural Computing and Applications.

[8]  Daniel Müllner,et al.  fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python , 2013 .

[9]  Hichem Frigui,et al.  Clustering by competitive agglomeration , 1997, Pattern Recognit..

[10]  Ke Wang,et al.  Hierarchical Document Clustering , 2009, Encyclopedia of Data Warehousing and Mining.

[11]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[12]  M. Narasimha Murty,et al.  Pattern Recognition - An Algorithmic Approach , 2011, Undergraduate Topics in Computer Science.

[13]  Fionn Murtagh,et al.  Algorithms for hierarchical clustering: an overview , 2012, WIREs Data Mining Knowl. Discov..

[14]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[15]  Vladimir Batagelj,et al.  Note on ultrametric hierarchical clustering algorithms , 1981 .

[16]  Christos Faloutsos,et al.  Designing Access Methods for Bitemporal Databases , 1998, IEEE Trans. Knowl. Data Eng..

[17]  David J. Hand,et al.  The Data Sets , 1994 .

[18]  A. D. Gordon A Review of Hierarchical Classification , 1987 .

[19]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[20]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[21]  Daniel Müllner,et al.  Modern hierarchical, agglomerative clustering algorithms , 2011, ArXiv.

[22]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[23]  Athman Bouguettaya,et al.  Efficient agglomerative hierarchical clustering , 2015, Expert Syst. Appl..

[24]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[25]  Qingsheng Zhu,et al.  A local cores-based hierarchical clustering algorithm for data sets with complex structures , 2018, 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC).

[26]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[27]  Manoranjan Dash,et al.  EFFICIENT PARTITIONING BASED HIERARCHICAL AGGLOMERATIVE CLUSTERING USING GRAPHICS ACCELERATORS WITH CUDA , 2013 .

[28]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[29]  Richard Nock,et al.  Stopping Criterion for Boosting-Based Data Reduction Techniques: from Binary to Multiclass Problem , 2003, J. Mach. Learn. Res..

[30]  Philip Calvert,et al.  Encyclopedia of Data Warehousing and Mining , 2006 .

[31]  Inmaculada Villanúa,et al.  Multivariate Linear Regression Model , 2003 .

[32]  H. Charles Romesburg,et al.  Cluster analysis for researchers , 1984 .

[33]  Long Chen,et al.  Overlapping Community Discovery Algorithm Based on Hierarchical Agglomerative Clustering , 2018, Int. J. Pattern Recognit. Artif. Intell..

[34]  H. Altay Güvenir,et al.  Instance-Based Regression by Partitioning Feature Projections , 2004, Applied Intelligence.

[35]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[36]  Rich Caruana,et al.  Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.

[37]  Thomas C. Kinnear,et al.  Marketing Research: An Applied Approach , 2000 .

[38]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[39]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[40]  Andrew W. Moore,et al.  K-means and Hierarchical Clustering , 2004 .

[41]  Jan Poland,et al.  Amplifying the Block Matrix Structure for Spectral Clustering. , 2005 .

[42]  Stefan Van Aelst,et al.  Fast and robust bootstrap for multivariate inference: The R package FRB , 2013 .

[43]  Rui-Ping Li,et al.  A maximum-entropy approach to fuzzy clustering , 1995, Proceedings of 1995 IEEE International Conference on Fuzzy Systems..

[44]  Katharina Anna Zweig,et al.  Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods , 2009, Algorithms for Molecular Biology.