DEDPUL: Method for Mixture Proportion Estimation and Positive-Unlabeled Classification based on Density Estimation

This paper studies Positive-Unlabeled Classification, the problem of semi-supervised binary classification in the case when Negative (N) class in the training set is contaminated with instances of Positive (P) class. We develop a novel method (DEDPUL) that simultaneously solves two problems concerning the contaminated Unlabeled (U) sample: estimates the proportions of the mixing components (P and N) in U, and classifies U. By conducting experiments on synthetic and real-world data we favorably compare DEDPUL with current state-of-the-art methods for both problems. We introduce an automatic procedure for DEDPUL hyperparameter optimization. Additionally, we improve two methods in the literature and achieve DEDPUL level of performance with one of them.

[1]  Chee Keong Kwoh,et al.  Positive-unlabeled learning for disease gene identification , 2012, Bioinform..

[2]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[3]  Gilles Blanchard,et al.  Decontamination of Mutual Contamination Models , 2017, J. Mach. Learn. Res..

[4]  Dong-Hong Ji,et al.  Positive Unlabeled Learning for Deceptive Reviews Detection , 2014, EMNLP.

[5]  Dacheng Tao,et al.  Multi-Positive and Unlabeled Learning , 2017, IJCAI.

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  Ambuj Tewari,et al.  Mixture Proportion Estimation via Kernel Embeddings of Distributions , 2016, ICML.

[8]  Martha White,et al.  Estimating the class prior and posterior from noisy positives and unlabeled data , 2016, NIPS.

[9]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[10]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[11]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[12]  Gilles Blanchard,et al.  Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[13]  Gang Niu,et al.  Class-prior estimation for learning from positive and unlabeled data , 2016, Machine Learning.

[14]  Gang Niu,et al.  Analysis of Learning from Positive and Unlabeled Data , 2014, NIPS.

[15]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[16]  Gang Niu,et al.  Alternate Estimation of a Classifier and the Class-Prior from Positive and Unlabeled Data , 2018, ArXiv.

[17]  Sugiyama Masashi,et al.  Positive-Unlabeled Learning with Non-Negative Risk Estimator , 2017 .

[18]  Larry A. Wasserman,et al.  Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo , 2007, AISTATS.