A Robust AUC Maximization Framework With Simultaneous Outlier Detection and Feature Selection for Positive-Unlabeled Classification

The positive-unlabeled (PU) classification is a common scenario in real-world applications such as healthcare, text classification, and bioinformatics, in which we only observe a few samples labeled as “positive” together with a large volume of “unlabeled” samples that may contain both positive and negative samples. Building robust classifiers for the PU problem is very challenging, especially for complex data where the negative samples overwhelm and mislabeled samples or corrupted features exist. To address these three issues, we propose a robust learning framework that unifies area under the curve maximization (a robust metric for biased labels), outlier detection (for excluding wrong labels), and feature selection (for excluding corrupted features). The generalization error bounds are provided for the proposed model that give valuable insight into the theoretical performance of the method and lead to useful practical guidance, e.g., to train a model, we find that the included unlabeled samples are sufficient as long as the sample size is comparable to the number of positive samples in the training process. Empirical comparisons and two real-world applications on surgical site infection (SSI) and EEG seizure detection are also conducted to show the effectiveness of the proposed model.

[1]  Deanna Needell,et al.  Linear Convergence of Stochastic Iterative Greedy Algorithms With Sparse Constraints , 2014, IEEE Transactions on Information Theory.

[2]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[3]  Alain Rakotomamonjy,et al.  Optimizing Area Under Roc Curve with SVMs , 2004, ROCAI.

[4]  Ulf Brefeld,et al.  {AUC} maximizing support vector learning , 2005 .

[5]  O. Chapelle Multi-Class Feature Selection with Support Vector Machines , 2008 .

[6]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[7]  N. Graham,et al.  Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation , 2002 .

[8]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[9]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[12]  Gilles Blanchard,et al.  Novelty detection: Unlabeled data definitely help , 2009, AISTATS.

[13]  Le Song,et al.  Relative Novelty Detection , 2009, AISTATS.

[14]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[15]  M. N. Nguyen,et al.  pro-Positive Unlabeled Learning for Time Series Classification , 2022 .

[16]  Xiao-Tong Yuan,et al.  Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization , 2013, ICML.

[17]  Feiping Nie,et al.  Feature Selection via Scaling Factor Integrated Multi-Class Support Vector Machines , 2017, International Joint Conference on Artificial Intelligence.

[18]  H. Johnson,et al.  A comparison of 'traditional' and multimedia information systems development practices , 2003, Inf. Softw. Technol..

[19]  Wenkai Li,et al.  A Positive and Unlabeled Learning Algorithm for One-Class Classification of Remote-Sensing Data , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Tong Zhang,et al.  Sparse Recovery With Orthogonal Matching Pursuit Under RIP , 2010, IEEE Transactions on Information Theory.

[21]  Gang Niu,et al.  Analysis of Learning from Positive and Unlabeled Data , 2014, NIPS.

[22]  Kaizhu Huang,et al.  Biased support vector machine for relevance feedback in image retrieval , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[23]  Jieping Ye,et al.  Forward-Backward Greedy Algorithms for General Convex Smooth Functions over A Cardinality Constraint , 2013, ICML.

[24]  Chee Keong Kwoh,et al.  Positive-unlabeled learning for disease gene identification , 2012, Bioinform..

[25]  Ching-Yung Lin,et al.  Healthcare audio event classification using Hidden Markov Models and Hierarchical Hidden Markov Models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[26]  Bing Liu,et al.  Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression , 2003, ICML.

[27]  Tijl De Bie,et al.  Kernel-based data fusion for gene prioritization , 2007, ISMB/ECCB.

[28]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[29]  Jean-Philippe Vert,et al.  A bagging SVM to learn from positive and unlabeled examples , 2010, Pattern Recognit. Lett..

[30]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[31]  Masashi Sugiyama,et al.  Class Prior Estimation from Positive and Unlabeled Data , 2014, IEICE Trans. Inf. Syst..

[32]  Genevera I. Allen,et al.  Within Group Variable Selection through the Exclusive Lasso , 2015, 1505.07517.

[33]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[34]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[35]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[36]  Pedro Larrañaga,et al.  Learning Bayesian classifiers from positive and unlabeled examples , 2007, Pattern Recognit. Lett..

[37]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[38]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[39]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[40]  Yang Yu,et al.  Automatic image annotation using group sparsity , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Don R. Hush,et al.  Network constraints and multi-objective optimization for one-class classification , 1996, Neural Networks.

[42]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[43]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[44]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[45]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[46]  Guido Smits,et al.  Robust outlier detection using SVM regression , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[47]  Johan A. K. Suykens,et al.  Transductively Learning from Positive Examples Only , 2009, ESANN.

[48]  See-Kiong Ng,et al.  Positive Unlabeled Leaning for Time Series Classification , 2011, IJCAI.

[49]  Philip S. Yu,et al.  Positive Unlabeled Learning for Data Stream Classification , 2009, SDM.

[50]  Harleen Kaur,et al.  Empirical Study on Applications of Data Mining Techniques in Healthcare , 2006 .

[51]  Kevin Chen-Chuan Chang,et al.  PEBL: positive example based learning for Web page classification using SVM , 2002, KDD.

[52]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[53]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[54]  Ya-Feng Liu,et al.  LLE Score: A New Filter-Based Unsupervised Feature Selection Method Based on Nonlinear Manifold Embedding and Its Application to Image Recognition , 2017, IEEE Transactions on Image Processing.

[55]  E. V. Thomas,et al.  Outlier detection in multivariate calibration , 1989 .

[56]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[57]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[58]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[59]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[60]  M. Hubert,et al.  Robust methods for partial least squares regression , 2003 .

[61]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[62]  Kenji Yamanishi,et al.  Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner , 2001, KDD '01.

[63]  G. Lightbody,et al.  A comparison of quantitative EEG features for neonatal seizure detection , 2008, Clinical Neurophysiology.

[64]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.