论文信息 - Access to Unlabeled Data can Speed up Prediction Time

Access to Unlabeled Data can Speed up Prediction Time

Semi-supervised learning (SSL) addresses the problem of training a classifier using a small number of labeled examples and many un-labeled examples. Most previous work on SSL focused on how availability of unlabeled data can improve the accuracy of the learned classifiers. In this work we study how un-labeled data can be beneficial for constructing faster classifiers. We propose an SSL algorithmic framework which can utilize unlabeled examples for learning classifiers from a predefined set of fast classifiers. We formally analyze conditions under which our algorithmic paradigm obtains significant improvements by the use of unlabeled data. As a side benefit of our analysis we propose a novel quantitative measure of the so-called cluster assumption. We demonstrate the potential merits of our approach by conducting experiments on the MNIST data set, showing that, when a sufficiently large unlabeled sample is available, a fast classifier can be learned from much fewer labeled examples than without such a sample.

[1] Alexander Zien,et al. Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[2] Alexander Zien,et al. Semi-Supervised Learning , 2006 .

[3] Matti Kääriäinen,et al. Generalization Error Bounds Using Unlabeled Data , 2005, COLT.

[4] Peter L. Bartlett,et al. Learning in Neural Networks: Theoretical Foundations , 1999 .

[5] Robert C. Bolles,et al. Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching , 1977, IJCAI.

[6] Dan Klein,et al. Structure compilation: trading structure for features , 2008, ICML '08.

[7] Shai Ben-David,et al. Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning , 2008, COLT.

[8] Daniel P. Huttenlocher,et al. Distance Transforms of Sampled Functions , 2012, Theory Comput..

[9] Rich Caruana,et al. Model compression , 2006, KDD '06.

[10] Philippe Rigollet,et al. Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption , 2006, J. Mach. Learn. Res..

[11] Xiaojin Zhu,et al. --1 CONTENTS , 2006 .

[12] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .

[14] Maria-Florina Balcan,et al. A PAC-Style Model for Learning from Labeled and Unlabeled Data , 2005, COLT.

[15] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[16] W. Lockau,et al. Contents , 2015 .

[17] Dariu Gavrila,et al. A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.