Efficient Training for Positive Unlabeled Learning

Positive unlabeled (PU) learning is useful in various practical situations, where there is a need to learn a classifier for a class of interest from an unlabeled data set, which may contain anomalies as well as samples from unknown classes. The learning task can be formulated as an optimization problem under the framework of statistical learning theory. Recent studies have theoretically analyzed its properties and generalization performance, nevertheless, little effort has been made to consider the problem of scalability, especially when large sets of unlabeled data are available. In this work we propose a novel scalable PU learning algorithm that is theoretically proven to provide the optimal solution, while showing superior computational and memory performance. Experimental evaluation confirms the theoretical evidence and shows that the proposed method can be successfully applied to a large variety of real-world problems involving PU learning.

[1]  Don R. Hush,et al.  Network constraints and multi-objective optimization for one-class classification , 1996, Neural Networks.

[2]  Francesco G. B. De Natale,et al.  Classtering: Joint Classification and Clustering with Mixture of Factor Analysers , 2016, ECAI.

[3]  Yang Yu,et al.  Learning with Augmented Class by Exploiting Unlabeled Data , 2014, AAAI.

[4]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[5]  Colin Campbell,et al.  A Linear Programming Approach to Novelty Detection , 2000, NIPS.

[6]  Bing Liu,et al.  Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression , 2003, ICML.

[7]  Gunnar Rätsch,et al.  Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[9]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[10]  Robert P. W. Duin,et al.  One-Class LP Classifiers for Dissimilarity Representations , 2002, NIPS.

[11]  Nathan Srebro,et al.  A GPU-tailored approach for training kernelized SVMs , 2011, KDD.

[12]  Gang Niu,et al.  Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning , 2016, NIPS.

[13]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[14]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[15]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[16]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[17]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[18]  Wenkai Li,et al.  A Positive and Unlabeled Learning Algorithm for One-Class Classification of Remote-Sensing Data , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Shehroz S. Khan,et al.  One-class classification: taxonomy of study and review of techniques , 2013, The Knowledge Engineering Review.

[21]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[22]  Yong Wang,et al.  Naive Bayes Classifier for Positive Unlabeled Learning with Uncertainty , 2010, SDM.

[23]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[24]  Ameet Talwalkar,et al.  Large-scale manifold learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[26]  Robert P. W. Duin,et al.  Combining One-Class Classifiers , 2001, Multiple Classifier Systems.

[27]  Qiang Yang,et al.  One-Class Collaborative Filtering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[28]  M. A. Hanson Invexity and the Kuhn–Tucker Theorem☆ , 1999 .

[29]  Kevin Chen-Chuan Chang,et al.  PEBL: positive example based learning for Web page classification using SVM , 2002, KDD.

[30]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[31]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[32]  Gang Niu,et al.  Beyond the Low-density Separation Principle: A Novel Approach to Semi-supervised Learning , 2016, ArXiv.

[33]  Moshe Koppel,et al.  Authorship verification as a one-class classification problem , 2004, ICML.

[34]  Gang Niu,et al.  Class-prior estimation for learning from positive and unlabeled data , 2016, Machine Learning.

[35]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[36]  Rebecca Willett,et al.  Hypergraph-Based Anomaly Detection of High-Dimensional Co-Occurrences , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Bernhard Schölkopf,et al.  Kernel method for percentile feature extraction , 2000 .

[38]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[39]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[40]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[41]  Masashi Sugiyama,et al.  Class Prior Estimation from Positive and Unlabeled Data , 2014, IEICE Trans. Inf. Syst..

[42]  Hwanjo Yu,et al.  Single-Class Classification with Mapping Convergence , 2005, Machine Learning.

[43]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[44]  Gang Niu,et al.  Analysis of Learning from Positive and Unlabeled Data , 2014, NIPS.

[45]  Anderson Rocha,et al.  Toward Open Set Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[47]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[48]  See-Kiong Ng,et al.  Positive Unlabeled Leaning for Time Series Classification , 2011, IJCAI.

[49]  Andrew Skabar Single-class classifier learning using neural networks: an application to the prediction of mineral deposits , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[50]  Philip S. Yu,et al.  Positive Unlabeled Learning for Data Stream Classification , 2009, SDM.

[51]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[52]  David M. J. Tax,et al.  One-class classification , 2001 .

[53]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[54]  Ivor W. Tsang,et al.  Convex and scalable weakly labeled SVMs , 2013, J. Mach. Learn. Res..

[55]  Ivor W. Tsang,et al.  Multi-view Positive and Unlabeled Learning , 2012, ACML.

[56]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[57]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[58]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[59]  Gang Niu,et al.  Theoretical Comparisons of Learning from Positive-Negative, Positive-Unlabeled, and Negative-Unlabeled Data , 2016, ArXiv.

[60]  David Windridge,et al.  Domain Anomaly Detection in Machine Perception: A System Architecture and Taxonomy , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[62]  Antonio Torralba,et al.  Semi-Supervised Learning in Gigantic Image Collections , 2009, NIPS.

[63]  Chih-Jen Lin,et al.  A Study on SMO-Type Decomposition Methods for Support Vector Machines , 2006, IEEE Transactions on Neural Networks.

[64]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[65]  Takafumi Kanamori,et al.  Inlier-Based Outlier Detection via Direct Density Ratio Estimation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[66]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[67]  Andreas Argyriou,et al.  A Unifying View of Representer Theorems , 2014, ICML.

[68]  Le Song,et al.  Relative Novelty Detection , 2009, AISTATS.

[69]  R. Young,et al.  FUNCTIONAL NEUROANATOMY , 1967 .

[70]  Johan A. K. Suykens,et al.  Multi-Class Supervised Novelty Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[72]  Rémi Gilleron,et al.  Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[73]  Rémi Gilleron,et al.  Positive and Unlabeled Examples Help Learning , 1999, ALT.

[74]  Robert P. W. Duin,et al.  Uniform Object Generation for Optimizing One-class Classifiers , 2002, J. Mach. Learn. Res..

[75]  T. Onoda,et al.  One class support vector machine based non-relevance feedback document retrieval , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[76]  Sanjay Mehrotra,et al.  On the Implementation of a Primal-Dual Interior Point Method , 1992, SIAM J. Optim..

[77]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[78]  Nagarajan Natarajan,et al.  PU Learning for Matrix Completion , 2014, ICML.

[79]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[80]  Albert D. Shieh,et al.  Ensembles of One Class Support Vector Machines , 2009, MCS.