Nearest Neighbor Algorithm for Positive and Unlabeled Learning with Uncertainty

This paper studies the problem of uncertain data classification under positive and unlabeled (PU) learning scenario. It proposes a novel algorithm, NNPU (nearest neighbor algorithm for positive and unlabeled learning), to handle this problem with two varieties, NNPUa and NNPUu. Experimental results on benchmark UCI datasets show that NNPUu, which considers the whole uncertain information on the datasets, has a better ability to classify unseen examples than NNPUa that considers the average value of uncertainty only. Furthermore, NNPU outperforms some existing algorithms such as NN-d, OCC (one-class classifier) and aPUNB in handling precise data.

[1]  Edward Hung,et al.  An Efficient Distance Calculation Method for Uncertain Objects , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[2]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[3]  Pedro Larrañaga,et al.  Learning Bayesian classifiers from positive and unlabeled examples , 2007, Pattern Recognit. Lett..

[4]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[5]  Rémi Gilleron,et al.  Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[6]  Philip S. Yu,et al.  Text classification without negative examples revisit , 2006, IEEE Transactions on Knowledge and Data Engineering.

[7]  Ian H. Witten,et al.  One-Class Classification by Combining Density and Class Probability Estimation , 2008, ECML/PKDD.

[8]  Fabrizio Angiulli,et al.  Fast condensed nearest neighbor rule , 2005, ICML.

[9]  Jinbo Bi,et al.  Support Vector Classification with Input Data Uncertainty , 2004, NIPS.

[10]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..