Theoretical Comparisons of Learning from Positive-Negative, Positive-Unlabeled, and Negative-Unlabeled Data

In PU learning, a binary classifier is trained only from positive (P) and unlabeled (U) data without negative (N) data. Although N data is missing, it sometimes outperforms PN learning (i.e., supervised learning) in experiments. In this paper, we theoretically compare PU (and the opposite NU) learning against PN learning, and prove that, one of PU and NU learning given infinite U data will almost always improve on PN learning. Our theoretical finding is also validated experimentally.

[1]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[2]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[3]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[4]  Gilles Blanchard,et al.  Novelty detection: Unlabeled data definitely help , 2009, AISTATS.

[5]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[6]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[7]  Gang Niu,et al.  Class-prior estimation for learning from positive and unlabeled data , 2016, Machine Learning.

[8]  Masashi Sugiyama,et al.  Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching , 2012, ICML.

[9]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[10]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[11]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[12]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure , 2002, Neural Computation.

[15]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[16]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[17]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[18]  Gang Niu,et al.  Beyond the Low-density Separation Principle: A Novel Approach to Semi-supervised Learning , 2016, ArXiv.

[19]  Wenkai Li,et al.  A Positive and Unlabeled Learning Algorithm for One-Class Classification of Remote-Sensing Data , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[21]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[22]  Gang Niu,et al.  Analysis of Learning from Positive and Unlabeled Data , 2014, NIPS.