Unbiased generative semi-supervised learning

Reliable semi-supervised learning, where a small amount of labelled data is complemented by a large body of unlabelled data, has been a long-standing goal of the machine learning community. However, while it seems intuitively obvious that unlabelled data can aid the learning process, in practise its performance has often been disappointing. We investigate this by examining generative maximum likelihood semi-supervised learning and derive novel upper and lower bounds on the degree of bias introduced by the unlabelled data. These bounds improve upon those provided in previous work, and are specifically applicable to the challenging case where the model is unable to exactly fit to the underlying distribution a situation which is common in practise, but for which fewer guarantees of semi-supervised performance have been found. Inspired by this new framework for analysing bounds, we propose a new, simple reweighing scheme which provides a provably unbiased estimator for arbitrary model/distribution pairs--an unusual property for a semi-supervised algorithm. This reweighing introduces no additional computational complexity and can be applied to very many models. Additionally, we provide specific conditions demonstrating the circumstance under which the unlabelled data will lower the estimator variance, thereby improving convergence.

[1]  Robert T. Collins,et al.  A Generative Model for Simultaneous Estimation of Human Body Shape and Pixel-Level Segmentation , 2012, ECCV.

[2]  Christopher Joseph Pal,et al.  Semi-supervised classification with hybrid generative/discriminative methods , 2007, KDD '07.

[3]  Ji Zhu,et al.  A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning , 2004, NIPS.

[4]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[5]  Tin Kam Ho,et al.  Building projectable classifiers of arbitrary complexity , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[6]  Fabio Gagliardi Cozman,et al.  Semi-Supervised Learning of Mixture Models , 2003, ICML.

[7]  Ben Taskar,et al.  Semi-Supervised Learning with Adversarially Missing Label Information , 2010, NIPS.

[8]  Jeff A. Bilmes,et al.  Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification , 2009, NIPS.

[9]  Vittorio Castelli,et al.  The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[10]  Lawrence Carin,et al.  Semi-Supervised Classification , 2004, Encyclopedia of Database Systems.

[11]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[12]  Yuhong Xiong,et al.  Erratum to "Mining Distinction and Commonality across Multiple Domains Using Generative Model for Text Classification" , 2012, IEEE Trans. Knowl. Data Eng..

[13]  Seong Joon Yoo,et al.  Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews , 2012, Expert Syst. Appl..

[14]  Santosh S. Venkatesh,et al.  Learning from a mixture of labeled and unlabeled examples with parametric side information , 1995, COLT '95.

[15]  Gideon S. Mann,et al.  Simple, robust, scalable semi-supervised learning via expectation regularization , 2007, ICML '07.

[16]  Christopher Joseph Pal,et al.  Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification , 2006, AAAI.

[17]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[18]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[19]  Fabio Gagliardi Cozman,et al.  Risks of Semi-Supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers , 2006, Semi-Supervised Learning.

[20]  Jeff A. Bilmes,et al.  Soft-Supervised Learning for Text Classification , 2008, EMNLP.

[21]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[22]  Haym Hirsh,et al.  Improving Short-Text Classification using Unlabeled Data for Classification Problems , 2000, ICML.

[23]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[25]  Junhui Wang,et al.  On Transductive Support Vector Machines , 2006 .

[26]  I-Cheng Yeh,et al.  Knowledge discovery on RFM model using Bernoulli sequence , 2009, Expert Syst. Appl..

[27]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[28]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[29]  Thomas Seidl,et al.  Modeling image similarity by Gaussian mixture models and the Signature Quadratic Form Distance , 2011, 2011 International Conference on Computer Vision.

[30]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[31]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[32]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[33]  Julian Eggert,et al.  Expectation Truncation and the Benefits of Preselection In Training Generative Models , 2010, J. Mach. Learn. Res..

[34]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[35]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[36]  Thorsten Joachims,et al.  Transductive Support Vector Machines , 2006, Semi-Supervised Learning.

[37]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[38]  Jeffrey C. Lagarias,et al.  Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions , 1998, SIAM J. Optim..

[39]  Krishnakumar Balasubramanian,et al.  Asymptotic Analysis of Generative Semi-Supervised Learning , 2010, ICML.

[40]  Fabio Gagliardi Cozman,et al.  Unlabeled Data Can Degrade Classification Performance of Generative Classifiers , 2002, FLAIRS.

[41]  Hui Xiong,et al.  Mining Distinction and Commonality across Multiple Domains Using Generative Model for Text Classification , 2012, IEEE Transactions on Knowledge and Data Engineering.

[42]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[43]  Yoshua Bengio,et al.  Entropy Regularization , 2006, Semi-Supervised Learning.

[44]  Carey E. Priebe,et al.  The Effect of Model Misspecification on Semi-Supervised Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Maria-Florina Balcan,et al.  A discriminative model for semi-supervised learning , 2010, J. ACM.

[46]  Tommi S. Jaakkola,et al.  Information Regularization with Partially Labeled Data , 2002, NIPS.

[47]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[48]  Edwin Thompson Jaynes,et al.  Probability theory , 2003 .

[49]  Alexander Zien,et al.  An Augmented PAC Model for Semi-Supervised Learning , 2006 .