Hierarchical cluster kernels for supervised and semi-supervised learning

Semi-supervised learning became an important subdomain of machine learning in the last years. These methods try to exploit the information provided by the large and easily gathered unlabeled data besides the labeled training set. Analogously, many semi-supervised kernels appeared which determine similarity in feature space considering also the unlabeled data points. In this paper we propose a novel kernel construction algorithm for supervised and semi-supervised learning, which actually constitutes a general frame of semi-supervised kernel construction. The technique is based on the cluster assumption: we cluster the labeled and unlabeled data by an agglomerative clustering technique, and then we use the linkage distances induced by the clustering hierarchy to construct our kernel. The hierarchical cluster kernel is then compared to other existing techniques and evaluated on synthetic and real data sets.