Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric

We consider the problem of learning a binary classifier from only positive and unlabeled observations (called PU learning). Recent studies in PU learning have shown superior performance theoretically and empirically. However, most existing algorithms may not be suitable for large-scale datasets because they face repeated computations of a large Gram matrix or require massive hyperparameter optimization. In this paper, we propose a computationally efficient and theoretically grounded PU learning algorithm. The proposed PU learning algorithm produces a closed-form classifier when the hypothesis space is a closed ball in reproducing kernel Hilbert space. In addition, we establish upper bounds of the estimation error and the excess risk. The obtained estimation error bound is sharper than existing results and the derived excess risk bound has an explicit form, which vanishes as sample sizes increase. Finally, we conduct extensive numerical experiments using both synthetic and real datasets, demonstrating improved accuracy, scalability, and robustness of the proposed algorithm.

[1]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[2]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[3]  Gert R. G. Lanckriet,et al.  On the empirical estimation of integral probability metrics , 2012 .

[4]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[5]  Max Welling,et al.  BOCK : Bayesian Optimization with Cylindrical Kernels , 2018, ICML.

[6]  Prasad Raghavendra,et al.  Agnostic Learning of Monomials by Halfspaces Is Hard , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[7]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[8]  Gang Niu,et al.  Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning , 2016, NIPS.

[9]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[10]  Qilong Wang,et al.  Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[12]  Xiaoli Li,et al.  Learning from Positive and Unlabeled Examples with Different Data Distributions , 2005, ECML.

[13]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[14]  Martin J. Wainwright,et al.  ON surrogate loss functions and f-divergences , 2005, math/0510521.

[15]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[16]  Frank Nielsen,et al.  Loss factorization, weakly supervised learning and label noise robustness , 2016, ICML.

[17]  Chengqi Zhang,et al.  Similarity-Based Approach for Positive and Unlabeled Learning , 2011, IJCAI.

[18]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[19]  Jesse Davis,et al.  Estimating the Class Prior in Positive and Unlabeled Data Through Decision Tree Induction , 2018, AAAI.

[20]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[21]  Gilles Blanchard,et al.  Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[22]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[23]  T. Hastie,et al.  Presence‐Only Data and the EM Algorithm , 2009, Biometrics.

[24]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[25]  Jieping Ye,et al.  Margin Based PU Learning , 2018, AAAI.

[26]  Gang Niu,et al.  Class-prior estimation for learning from positive and unlabeled data , 2016, Machine Learning.

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  Gang Niu,et al.  Analysis of Learning from Positive and Unlabeled Data , 2014, NIPS.

[29]  Maria A. Zuluaga,et al.  Learning from Only Positive and Unlabeled Data to Detect Lesions in Vascular CT Images , 2011, MICCAI.

[30]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[31]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[32]  Martha White,et al.  Estimating the class prior and posterior from noisy positives and unlabeled data , 2016, NIPS.

[33]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[34]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[35]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[36]  Ambuj Tewari,et al.  Mixture Proportion Estimation via Kernel Embeddings of Distributions , 2016, ICML.

[37]  Gilles Blanchard,et al.  Novelty detection: Unlabeled data definitely help , 2009, AISTATS.

[38]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[39]  S. Smale,et al.  ESTIMATING THE APPROXIMATION ERROR IN LEARNING THEORY , 2003 .

[40]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[41]  Xiaoli Li,et al.  Ensemble Positive Unlabeled Learning for Disease Gene Identification , 2014, PloS one.

[42]  Heinrich Jiang,et al.  Uniform Convergence Rates for Kernel Density Estimation , 2017, ICML.

[43]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[44]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[45]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[46]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[47]  Panos M. Pardalos,et al.  Convex optimization theory , 2010, Optim. Methods Softw..

[48]  Gang Niu,et al.  Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data , 2016, ICML.

[49]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[50]  Masahiro Kato,et al.  Learning from Positive and Unlabeled Data with a Selection Bias , 2018, ICLR.

[51]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[52]  Gang Niu,et al.  Positive-Unlabeled Learning with Non-Negative Risk Estimator , 2017, NIPS.

[53]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[54]  Rémi Gilleron,et al.  Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[55]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2007, 0708.2321.

[56]  Chee Keong Kwoh,et al.  Positive-unlabeled learning for disease gene identification , 2012, Bioinform..

[57]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[58]  Junsong Yuan,et al.  Positive and Unlabeled Learning for Anomaly Detection with Multi-features , 2017, ACM Multimedia.

[59]  Zhi-Hua Zhou,et al.  Efficient Training for Positive Unlabeled Learning , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.