Laplacian unit-hyperplane learning from positive and unlabeled examples

We propose a Laplacian unit-hyperplane classifier (LUHC) for PU learning.LUHC employs both geometrical and discriminant properties of PU information.The QPP in LUHC is rather small, resulting in its faster training speed.A parameter ? is introduced to control the upper bounds on the fractions of positive examples with margin errors.Experimental results show its efficiency and superiority. In machine learning and data mining, learning from positive and unlabeled examples (PU learning) has attracted a great deal of attention, and the corresponding classifiers are required because of its applications in many practical areas. For PU learning, we propose a novel classifier called Laplacian unit-hyperplane classifier (LUHC), which determines a decision unit-hyperplane by solving a quadratic programming problem (QPP). The advantages of our LUHC are as follows: (1) Both geometrical and discriminant properties of the examples are exploited, resulting in better classification performance. (2) The size of QPP to be solved is small since it depends only on the number of the positive examples, resulting in faster training speed. (3) A meaningful parameter ? is introduced to control the upper bounds on the fractions of positive examples with margin errors. Preliminary experiments on both synthetic and real data sets show high level of agreement with aforementioned hypothesis, suggesting that our LUHC is superior to biased support vector machine, spy-expectation maximization, and naive Bayes in both classification ability and computation efficiency.

[1]  Yong Shi,et al.  ν-Nonparallel support vector machine for pattern classification , 2014, Neural Computing and Applications.

[2]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[3]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[4]  Zhi-Xia Yang,et al.  Nonparallel Hyperplanes Proximal Classifiers Based on Manifold Regularization for Labeled and Unlabeled Examples , 2013, Int. J. Pattern Recognit. Artif. Intell..

[5]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[6]  Xuelong Li,et al.  Semisupervised Dimensionality Reduction and Classification Through Virtual Label Regression , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[8]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[9]  Jiawei Han,et al.  PEBL: Web page classification without negative examples , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[11]  Jianguo Jiang,et al.  Automatic image annotation by semi-supervised manifold kernel density estimation , 2014, Inf. Sci..

[12]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[13]  Adel M. Alimi,et al.  Natural Language Morphology Integration in Off-Line Arabic Optical Text Recognition , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  C. M. Bishop,et al.  Improvements on Twin Support Vector Machines , 2011 .

[15]  Xinge You,et al.  Generalization performance of magnitude-preserving semi-supervised ranking with graph-based regularization , 2013, Inf. Sci..

[16]  Korris Fu-Lai Chung,et al.  Support vector machine with manifold regularization and partially labeling privacy protection , 2015, Inf. Sci..

[17]  Bernhard Schölkopf,et al.  Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[18]  Xiaoli Li,et al.  Ensemble Positive Unlabeled Learning for Disease Gene Identification , 2014, PloS one.

[19]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[20]  Nong Sang,et al.  Using clustering analysis to improve semi-supervised classification , 2013, Neurocomputing.

[21]  Wanli Zuo,et al.  SVM based adaptive learning method for text classification from positive and unlabeled documents , 2008, Knowledge and Information Systems.

[22]  Jing Liu,et al.  Sparse constraint nearest neighbour selection in cross-media retrieval , 2010, 2010 IEEE International Conference on Image Processing.

[23]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[24]  Bing Liu,et al.  Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression , 2003, ICML.

[25]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[26]  Naonori Ueda,et al.  Adaptive semi-supervised learning on labeled and unlabeled data with different distributions , 2012, Knowledge and Information Systems.

[27]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[28]  S. Sundararajan,et al.  Active learning in partially supervised classification , 2009, CIKM.

[29]  Qiang Yang,et al.  Learning with Positive and Unlabeled Examples Using Topic-Sensitive PLSA , 2010, IEEE Transactions on Knowledge and Data Engineering.

[30]  S. Sathiya Keerthi,et al.  A pairwise ranking based approach to learning with positive and unlabeled examples , 2011, CIKM '11.

[31]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[32]  David W. Lewis,et al.  Matrix theory , 1991 .

[33]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[34]  François Laviolette,et al.  Accelerated Robust Point Cloud Registration in Natural Environments through Positive and Unlabeled Learning , 2013, IJCAI.

[35]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[36]  Huanhuan Chen,et al.  Semisupervised Classification With Cluster Regularization , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Xiaoli Li,et al.  Learning from Positive and Unlabeled Examples with Different Data Distributions , 2005, ECML.

[38]  Zhijie Xu,et al.  Learning with positive and unlabeled examples using biased twin support vector machine , 2014, Neural Computing and Applications.

[39]  Ivor W. Tsang,et al.  Multi-view Positive and Unlabeled Learning , 2012, ACML.

[40]  Nai-Yang Deng,et al.  Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions , 2012 .

[41]  Ning Ye,et al.  Erratum to "Boundary detection and sample reduction for one-class Support Vector Machines" [Neurocomputing 123 (2014) 166-173] , 2014, Neurocomputing.

[42]  Yuan-Hai Shao,et al.  Nonparallel hyperplane support vector machine for binary classification problems , 2014, Inf. Sci..

[43]  Friedhelm Schwenker,et al.  Pattern classification and clustering: A review of partially supervised learning approaches , 2014, Pattern Recognit. Lett..

[44]  Philip S. Yu,et al.  Text classification without negative examples revisit , 2006, IEEE Transactions on Knowledge and Data Engineering.

[45]  Masashi Sugiyama,et al.  Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching , 2012, ICML.