Kernel-Based Transductive Learning with Nearest Neighbors

In the k -nearest neighbor (KNN) classifier, nearest neighbors involve only labeled data. That makes it inappropriate for the data set that includes very few labeled data. In this paper, we aim to solve the classification problem by applying transduction to the KNN algorithm. We consider two groups of nearest neighbors for each data point -- one from labeled data, and the other from unlabeled data. A kernel function is used to assign weights to neighbors. We derive the recurrence relation of neighboring data points, and then present two solutions to the classification problem. One solution is to solve it by matrix computation for small or medium-size data sets. The other is an iterative algorithm for large data sets, and in the iterative process an energy function is minimized. Experiments show that our solutions achieve high performance and our iterative algorithm converges quickly.

[1]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[2]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[3]  Hui Xiong,et al.  Enhancing semi-supervised clustering: a feature projection perspective , 2007, KDD '07.

[4]  Mikhail Belkin,et al.  Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[5]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[6]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, ICML '05.

[7]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[8]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[9]  Rong Jin,et al.  Semi-Supervised Learning by Mixed Label Propagation , 2007, AAAI.

[10]  Kurt Driessens,et al.  Using Weighted Nearest Neighbor to Benefit from Unlabeled Data , 2006, PAKDD.

[11]  Lutgarde M. C. Buydens,et al.  KNN-kernel density-based clustering for high-dimensional multivariate data , 2006, Comput. Stat. Data Anal..

[12]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[13]  M. Seeger Learning with labeled and unlabeled dataMatthias , 2001 .

[14]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[15]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[16]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.