A robust ensemble approach to learn from positive and unlabeled data using SVM base models

We present a novel approach to learn binary classifiers when only positive and unlabeled instances are available (PU learning). This problem is routinely cast as a supervised task with label noise in the negative set. We use an ensemble of SVM models trained on bootstrap resamples of the training data for increased robustness against label noise. The approach can be considered in a bagging framework which provides an intuitive explanation for its mechanics in a semi-supervised setting. We compared our method to state-of-the-art approaches in simulations using multiple public benchmark data sets. The included benchmark comprises three settings with increasing label noise: (i) fully supervised, (ii) PU learning and (iii) PU learning with false positives. Our approach shows a marginal improvement over existing methods in the second setting and a significant improvement in the third.

[1]  Jean-Philippe Vert,et al.  A bagging SVM to learn from positive and unlabeled examples , 2010, Pattern Recognit. Lett..

[2]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[3]  J. Weston,et al.  Support Vector Machine Solvers , 2007 .

[4]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[5]  WuXindong,et al.  Class noise vs. attribute noise , 2004 .

[6]  Vladan Babovic,et al.  Genetic Programming, Ensemble Methods and the Bias/Variance Tradeoff - Introductory Investigations , 2000, EuroGP.

[7]  Yu Hen Hu,et al.  Vehicle classification in distributed sensor networks , 2004, J. Parallel Distributed Comput..

[8]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[9]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[10]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[11]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[12]  Johan A. K. Suykens,et al.  A semi-supervised formulation to binary kernel spectral clustering , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[13]  J. Suykens,et al.  A kernel-based integration of genome-wide data for clinical decision support , 2009, Genome Medicine.

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Zhigang Liu,et al.  Partially Supervised Classification - Based on Weighted Unlabeled Samples Support Vector Machine , 2005, ADMA.

[16]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[17]  Bart De Moor,et al.  eXtasy: variant prioritization by genomic data fusion , 2013, Nature Methods.

[18]  Jean-Philippe Vert,et al.  ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples , 2011, BMC Bioinformatics.

[19]  Gonzalo Martínez-Muñoz,et al.  Out-of-bag estimation of the optimal sample size in bagging , 2010, Pattern Recognit..

[20]  KohaviRon,et al.  An Empirical Comparison of Voting Classification Algorithms , 1999 .

[21]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[22]  Bart De Moor,et al.  Easy Hyperparameter Search Using Optunity , 2014, ArXiv.

[23]  Bing Liu,et al.  Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression , 2003, ICML.

[24]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[26]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[27]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[28]  Gavin C. Cawley,et al.  Leave-One-Out Cross-Validation Based Model Selection Criteria for Weighted LS-SVMs , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[29]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[30]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[31]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[32]  Hwanjo Yu,et al.  Single-Class Classification with Mapping Convergence , 2005, Machine Learning.

[33]  Kevin Chen-Chuan Chang,et al.  PEBL: positive example based learning for Web page classification using SVM , 2002, KDD.

[34]  Denis J. Dean,et al.  Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables , 1999 .

[35]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[36]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[37]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[38]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[39]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[40]  Yves Grandvalet,et al.  Bagging Equalizes Influence , 2004, Machine Learning.

[41]  Robert Tibshirani,et al.  The out-of-bootstrap method for model averaging and selection , 1997 .

[42]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[43]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007 .

[44]  Mykola Pechenizkiy,et al.  Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[45]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[46]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[47]  Johan A. K. Suykens,et al.  EnsembleSVM: a library for ensemble learning using support vector machines , 2014, J. Mach. Learn. Res..

[48]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.