Class Prior Estimation from Positive and Unlabeled Data

We consider the problem of learning a classifier using only positive and unlabeled samples. In this setting, it is known that a classifier can be successfully learned if the class prior is available. However, in practice, the class prior is unknown and thus must be estimated from data. In this paper, we propose a new method to estimate the class prior by partially matching the class-conditional density of the positive class to the input density. By performing this partial matching in terms of the Pearson divergence, which we estimate directly without density estimation via lower-bound maximization, we can obtain an analytical estimator of the class prior. We further show that an existing class prior estimation method can also be interpreted as performing partial matching under the Pearson divergence, but in an indirect manner. The superiority of our direct class prior estimation method is illustrated on several benchmark datasets.

[1]  Masashi Sugiyama,et al.  Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching , 2012, ICML.

[2]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[3]  Wenkai Li,et al.  A Positive and Unlabeled Learning Algorithm for One-Class Classification of Remote-Sensing Data , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[7]  Masashi Sugiyama,et al.  Superfast-Trainable Multi-Class Probabilistic Classifier by Least-Squares Posterior Fitting , 2010, IEICE Trans. Inf. Syst..

[8]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[9]  Takafumi Kanamori,et al.  Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection , 2008, NIPS.

[10]  Takafumi Kanamori,et al.  Inlier-Based Outlier Detection via Direct Density Ratio Estimation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11]  A. Keziou Dual representation of Φ-divergences and applications , 2003 .

[12]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..