Class-prior estimation for learning from positive and unlabeled data

We consider the problem of estimating the class prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, with additional samples coming only from the positive class, the class prior of the unlabeled dataset can be estimated correctly. Our key idea is to use properly penalized divergences for model fitting to cancel the error caused by the absence of negative samples. We further show that the use of the penalized $$L_1$$L1-distance gives a computationally efficient algorithm with an analytic solution. The consistency, stability, and estimation error are theoretically analyzed. Finally, we experimentally demonstrate the usefulness of the proposed method.

[1]  Takafumi Kanamori,et al.  Density-Difference Estimation , 2012, Neural Computation.

[2]  J. F. Bonnans,et al.  Perturbed Optimization in Banach spaces I: A General Theory based on a Weak Directional Constraint Qualification , 1996 .

[3]  Masashi Sugiyama,et al.  Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching , 2012, ICML.

[4]  Alexander Shapiro,et al.  Optimization Problems with Perturbations: A Guided Tour , 1998, SIAM Rev..

[5]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure , 2002, Neural Computation.

[6]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[7]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[8]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[9]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[10]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[11]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[12]  Sunita Sarawagi,et al.  Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection , 2014, ICML.

[13]  Masashi Sugiyama,et al.  Superfast-Trainable Multi-Class Probabilistic Classifier by Least-Squares Posterior Fitting , 2010, IEICE Trans. Inf. Syst..

[14]  Gang Niu,et al.  Analysis of Learning from Positive and Unlabeled Data , 2014, NIPS.

[15]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[16]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[17]  Robert D. Nowak,et al.  A Neyman-Pearson approach to statistical learning , 2005, IEEE Transactions on Information Theory.

[18]  A. Keziou Dual representation of Φ-divergences and applications , 2003 .

[19]  Clayton Scott,et al.  Class Proportion Estimation with Application to Multiclass Anomaly Rejection , 2013, AISTATS.

[20]  Gilles Blanchard,et al.  Novelty detection: Unlabeled data definitely help , 2009, AISTATS.

[21]  Masashi Sugiyama,et al.  Class Prior Estimation from Positive and Unlabeled Data , 2014, IEICE Trans. Inf. Syst..