Group Labeling Methodology Using Distance-based Data Grouping Algorithms

Clustering algorithms are often used to form groups based on the similarity of their members. In this context, understanding a group is just as important as its composition. Identifying, or labeling groups can assist with their interpretation and, consequently, guide decision-making efforts by taking into account the features from each group. Interpreting groups can be beneficial when it is necessary to know what makes an element a part of a given group, what are the main features of a group, and what are the differences and similarities among them. This work describes a method for finding relevant features and generate labels for the elements of each group, uniquely identifying them. This way, our approach solves the problem of finding relevant definitions that can identify groups. The proposed method transforms the standard output of an unsupervised distance-based clustering algorithm into a Pertinence Degree (GP), where each element of the database receives a GP concerning each formed group. The elements with their GPs are used to formulate ranges of values for their attributes. Such ranges can identify the groups uniquely. The labels produced by this approach averaged  94 . 83%  of correct answers for the analyzed databases, allowing a natural interpretation of the generated definitions.

[1]  Alaa M. El-Halees,et al.  Breast Cancer Severity Degree Predication Using Data Mining Techniques in the Gaza Strip , 2018, 2018 International Conference on Promising Electronic Technologies (ICPET).

[2]  Can Atilgan,et al.  A memory efficient distributed fuzzy joint points clustering algorithm , 2016, 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT).

[3]  Kristoko Dwi Hartomo,et al.  Mitigation & Identification for Local Aridity, Based of Vegetation Indices Combined with Spatial Statistics & Clustering K Means , 2019, Journal of Physics: Conference Series.

[4]  Rodrigo M. S. Veras,et al.  Automatic Cluster Labeling Based on Phylogram Analysis , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[5]  Bagus Mulyawan,et al.  Recommendation Product Based on Customer Categorization with K-Means Clustering Method , 2019, IOP Conference Series: Materials Science and Engineering.

[6]  Parth Mehta,et al.  Survey of unsupervised machine learning algorithms on precision agricultural data , 2015, 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS).

[7]  Jie Gong,et al.  Survey on Software Vulnerability Analysis Method Based on Machine Learning , 2016, 2016 IEEE First International Conference on Data Science in Cyberspace (DSC).

[8]  Lalu Banoth,et al.  A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection , 2017 .

[9]  Hassane Bouzahir,et al.  Improving of Fingerprint Segmentation Images Based on K-MEANS and DBSCAN Clustering , 2019 .

[10]  Ajay Kumar,et al.  Density Based Initialization Method for K-Means Clustering Algorithm , 2017 .

[11]  Benjamin A. Rizkin,et al.  Supervised machine learning for prediction of zirconocene-catalyzed α-olefin polymerization , 2019, Chemical Engineering Science.

[12]  Adel Soleimani Nezhad,et al.  Clustering scientific articles based on the k_means algorithmCase Study: Iranian Research Institute for information Science and Technology (IranDoc) , 2019 .

[13]  Adil Bagirov,et al.  Batch clustering algorithm for big data sets , 2016, 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT).

[14]  M. Siddappa,et al.  Implementation and comparison of K-means and fuzzy C-means algorithms for agricultural data , 2017, 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT).

[15]  Thiago Alexandre Salgueiro Pardo,et al.  Explorando mapas de relacionamento com base em subtópicos para sumarização multidocumento Exploring the subtopic-based relationship map strategy for multi-document summarization , 2016 .

[16]  Douglas H. Fisher,et al.  Improving Inference through Conceptual Clustering , 1987, AAAI.

[17]  Kun Li,et al.  Protein Function Detection Based on Machine Learning: Survey and Possible Solutions , 2016, 2016 15th International Symposium on Parallel and Distributed Computing (ISPDC).

[18]  Charu C. Aggarwal,et al.  Data Clustering: Algorithms and Applications , 2014 .

[19]  M. Ali Akcayol,et al.  A comprehensive survey for sentiment analysis tasks using machine learning techniques , 2016, 2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA).

[20]  Ana Carolina Lorena,et al.  Inteligência artificial: uma abordagem de aprendizado de máquina , 2011 .

[21]  Vinicius Ponte Machado,et al.  Automatic cluster labeling through Artificial Neural Networks , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[22]  Ridha Bouallegue,et al.  Exploiting machine learning strategies and RSSI for localization in wireless sensor networks: A survey , 2017, 2017 13th International Wireless Communications and Mobile Computing Conference (IWCMC).

[23]  Ricardo Matsumura de Araújo,et al.  Learning to Identify At-Risk Students in Distance Education Using Interaction Counts , 2016, RITA.

[24]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[25]  Aysegul Gunduz,et al.  A comparative analysis of speech signal processing algorithms for Parkinson's disease classification and the use of the tunable Q-factor wavelet transform , 2019, Appl. Soft Comput..

[26]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[27]  Jair Minoro Abe,et al.  Paraconsistent Extractor of Mammographic Images Applied in the Process of Diagnosis of Breast Cancer Assisted by Computer , 2018, 2018 Innovations in Intelligent Systems and Applications (INISTA).

[28]  Ricardo A. L. Rabêlo,et al.  Automatic labelling of clusters of discrete and continuous data with supervised machine learning , 2016, Knowl. Based Syst..

[29]  Wei-Mei Chen,et al.  Density-based clustering algorithm for GPGPU computing , 2017, 2017 International Conference on Applied System Innovation (ICASI).

[30]  Sildomar T. Monteiro,et al.  Desempenho de algoritmos de aprendizagem por reforço sob condições de ambiguidade sensorial em robótica móvel , 2004 .

[31]  Miquel Barceló,et al.  Inteligencia Artificial , 2001 .

[32]  Kelson Rômulo Teixeira Aires,et al.  Medical Image Segmentation Using Seeded Fuzzy C-means: A Semi-supervised Clustering Algorithm , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).