Learning from a Test Set

Classication of partially labeled data requires linking the unlabeled input distribution P(x) with the conditional distribution P(y|x) obtained from the labeled data. The latter should, for example, vary little in high density regions. The key problem is to articulate a general principle behind this and other such reasonable assumptions. In this paper we provide a new approach to semi-supervised learning based on the stability of estimated labels for the unlabeled dataset, e.g a large test set, and the maximization of the mutual label relation. No clustering assumptions are required and the approach remains tractable even for continuous marginal class densities. We demonstrate the approach on synthetic examples and UCI repository datasets.

[1]  Stephen J. Roberts,et al.  Minimum-Entropy Data Partitioning Using Reversible Jump Markov Chain Monte Carlo , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Naftali Tishby,et al.  Data Clustering by Markovian Relaxation and the Information Bottleneck Method , 2000, NIPS.

[4]  John D. Lafferty,et al.  Semi-supervised learning using randomized mincuts , 2004, ICML.

[5]  Fabio Gagliardi Cozman,et al.  Semi-supervised Learning of Classifiers : Theory , Algorithms and Their Application to Human-Computer Interaction , 2004 .

[6]  Nicu Sebe,et al.  Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[8]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[9]  King-Sun Fu,et al.  Error estimation in pattern recognition via LAlpha -distance between posterior density functions , 1976, IEEE Trans. Inf. Theory.

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Robert P. W. Duin,et al.  On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions , 1976, IEEE Transactions on Computers.

[12]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[13]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[14]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.