Combining the Active Learning Algorithm Based on the Silhouette Coefficient with PCKmeans Algorithm

The paper discussed how to increase the effectiveness of semi-supervised clustering algorithm by integrating active learning and semi-supervised clustering to guide the model. To solve this complexity problem of the clustering algorithm, this paper presented an active semi- supervised k-means clustering model based on silhouette coefficient by utilizing a pair-wise constraint clustering method in PCKmeans and actively selects valuable samples to establish constraints. Our method was based on silhouette coefficient. The model was iterated until the number of queries reaches a threshold or the clustering algorithm achieves an acceptable performance. The method optimized the semi-supervised k means by using Local Sample Density (LDS) sampling strategy in order to ensure the stability of the algorithm. Furthermore, a distance-based sampling method, which reduced the queries quantity as well as increase the number of constraint samples, was introduced to optimize the process of establishing pair-wise constraints. These two methods significantly promoted the effectiveness of clustering algorithm. The experimental results indicated that our model outperforms the compared models in MI and ARI with 5% and 6% boost than k-means, PCKMeans, Min-Max, and LDS models.

[1]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[2]  Abdulrazak Yahya Saleh,et al.  A Novel K-Means Evolving Spiking Neural Network Model for Clustering Problems , 2015, ISNN.

[3]  Shehroz S. Khan,et al.  A Novel Initial Clusters Generation Method for K-means-based Clustering Algorithms for Mixed Datasets , 2019, ArXiv.

[4]  Dharmender Kumar,et al.  A novel hybrid K-means and artificial bee colony algorithm approach for data clustering , 2018 .

[5]  Jianya Gong,et al.  A Novel k-Means Clustering Based Task Decomposition Method for Distributed Vector-Based CA Models , 2017, ISPRS Int. J. Geo Inf..

[6]  Wai Lam,et al.  An active learning framework for semi-supervised document clustering with language modeling , 2009, Data Knowl. Eng..

[7]  Marie desJardins,et al.  Active Constrained Clustering by Examining Spectral Eigenvectors , 2005, Discovery Science.

[8]  G. Rao S,et al.  A Novel Approach in Clustering Algorithm to Evaluate the Performance of Regression Analysis , 2018, 2018 IEEE 8th International Advance Computing Conference (IACC).

[9]  Abdorrahman Haeri,et al.  A novel selective clustering framework for appropriate labeling of the clusters based on K-means algorithm , 2019, Scientia Iranica.

[10]  Mark J. Embrechts,et al.  On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification , 2009, ICANN.

[11]  Bernadette Bouchon-Meunier,et al.  An Efficient Active Constraint Selection Algorithm for Clustering , 2010, 2010 20th International Conference on Pattern Recognition.

[12]  P. Ganesh Kumar,et al.  MULTISTAGE MUTUAL INFORMATION FOR INFORMATIVE GENE SELECTION , 2011 .

[13]  Rong Jin,et al.  Active query selection for semi-supervised clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[14]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[15]  Aun Irtaza,et al.  A novel fuzzy k-means latent semantic analysis (FKLSA) approach for topic modeling over medical and health text corpora , 2019, J. Intell. Fuzzy Syst..

[16]  Julian Jang,et al.  MSD-Kmeans: A Novel Algorithm for Efficient Detection of Global and Local Outliers , 2019, ArXiv.

[17]  Yahya Forghani,et al.  A Novel K-means-based Feature Reduction , 2019 .

[18]  S GovindaRao,et al.  A Novel Approach in Clustering Algorithm to Evaluate the Performance of Regression Analysis , 2018 .

[19]  Wai Lam,et al.  Active Learning of Constraints for Semi-supervised Text Clustering , 2007, SDM.