Consensus rate-based label propagation for semi-supervised classification

Abstract Label propagation is one of the most widely used semi-supervised classification methods. It utilizes neighborhood structures of observations to apply the smoothness assumption, which describes that observations close to each other are more likely to share a label. However, a single neighborhood structure cannot appropriately reflect intrinsic data structures, and hence, existing label propagation methods can fail to achieve superior performance. To overcome these limitations, we propose a label propagation algorithm based on consensus rates that are calculated by summarizing multiple clustering solutions to incorporate various properties of the data. Thus, the proposed algorithm can effectively reflect the intrinsic data structures, and yield accurate classification results. Experiments are conducted on various benchmark datasets to examine the properties of the proposed algorithm, and to compare it with the existing label propagation methods. The experimental results confirm that the proposed label propagation algorithm demonstrated superior performance compared to the existing methods.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[3]  Yuchou Chang,et al.  Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm , 2008, Pattern Recognit..

[4]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[5]  Jason Weston,et al.  Semi-supervised Protein Classification Using Cluster Kernels , 2003, NIPS.

[6]  Celso André R. de Sousa,et al.  An overview on the Gaussian Fields and Harmonic Functions method for semi-supervised learning , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[7]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[8]  Haizhou Li,et al.  A Comparison of Categorical Attribute Data Clustering Methods , 2014, S+SSPR.

[9]  Boris G. Mirkin,et al.  Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads , 2010, J. Classif..

[10]  Daniel A. Ashlock,et al.  MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering , 2009, BMC Bioinformatics.

[11]  Donghai Guan,et al.  Nearest neighbor editing aided by unlabeled data , 2009, Inf. Sci..

[12]  Boris G. Mirkin,et al.  A-Wardpβ: Effective hierarchical clustering using the Minkowski metric and a fast k-means initialisation , 2016, Inf. Sci..

[13]  Seoung Bum Kim,et al.  Density-based geodesic distance for identifying the noisy and nonlinear clusters , 2016, Inf. Sci..

[14]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Zoubin Ghahramani,et al.  Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning , 2004, NIPS.

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  Kfir Y. Levy,et al.  k*-Nearest Neighbors: From Global to Local , 2017, NIPS.

[18]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[19]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[20]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Tao Wang,et al.  Label propagation and higher-order constraint-based segmentation of fluid-associated regions in retinal SD-OCT images , 2016, Inf. Sci..

[22]  Yuchou Chang,et al.  Consensus unsupervised feature ranking from multiple views , 2008, Pattern Recognit. Lett..

[23]  Renato Cordeiro de Amorim,et al.  Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering , 2012, Pattern Recognit..

[24]  Jingsheng Lei,et al.  A clustering ensemble: Two-level-refined co-association matrix with path-based transformation , 2015, Pattern Recognit..

[25]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[26]  Jarek Nieplocha,et al.  ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis , 2006, IEEE Transactions on Parallel and Distributed Systems.

[27]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[28]  Dacheng Tao,et al.  Fick’s Law Assisted Propagation for Semisupervised Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Wei Liu,et al.  Robust multi-class transductive learning with graphs , 2009, CVPR.

[30]  Kai Li,et al.  Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[31]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[32]  Nicolas Le Roux,et al.  Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[33]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[34]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[35]  Gene H. Golub,et al.  Matrix computations , 1983 .

[36]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Chien-Liang Liu,et al.  Semi-Supervised Text Classification With Universum Learning , 2016, IEEE Transactions on Cybernetics.

[38]  Mukund Balasubramanian,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[39]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[40]  Dongqing Xie,et al.  A New Unsupervised Feature Ranking Method for Gene Expression Data Based on Consensus Affinity , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  Jiangtao Peng,et al.  Error bounds of multi-graph regularized semi-supervised classification , 2009, Inf. Sci..

[42]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2008, IEEE Trans. Knowl. Data Eng..