An Active Learning Based on Uncertainty and Density Method for Positive and Unlabeled Data

Active learning can select most informative unlabeled samples to manually annotate to enlarge the training set. Many active learning methods have been proposed so far, most of them work for these data that have all classes of tagged data. A few methods work for positive and unlabeled data and the computational complexity of existing methods is particularly high and they can’t work well for big data. In this paper, we proposed an active learning approach that works well when only small number positive data are available in big data. We utilize data preprocessing to remove most of the outliers, so the density calculation is simplified relative to KNN algorithm, and our proposed sample selection strategy Min-Uncertainty Density (MDD) can help select more uncertain and higher density unlabeled samples with less computation. A combined semi-supervised learning active learning technique (MDD-SSAL) automatically annotating some confident unlabeled samples in the each iteration is proposed to reduce the number of manually annotated samples. Experimental results indicate that our proposed method is competitive with other similar methods.

[1]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[2]  Friedhelm Schwenker,et al.  Combining committee-based semi-supervised learning and active learning , 2010 .

[3]  Min Wang,et al.  Active learning through density clustering , 2017, Expert Syst. Appl..

[4]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[5]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[6]  Zhong Jin,et al.  Active learning combining uncertainty and diversity for multi-class image classification , 2015, IET Comput. Vis..

[7]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[8]  Wenjian Wang,et al.  An active learning-based SVM multi-class classification model , 2015, Pattern Recognit..

[9]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[10]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[11]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jingbo Zhu,et al.  Active Learning With Sampling by Uncertainty and Density for Data Annotations , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[14]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[15]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[16]  Yuanxiang Li,et al.  A Reverse Nearest Neighbor Based Active Semi-supervised Learning Method for Multivariate Time Series Classification , 2016, DEXA.

[17]  Dong-Hong Ji,et al.  Positive Unlabeled Learning for Deceptive Reviews Detection , 2014, EMNLP.

[18]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[19]  Yifei Li,et al.  An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification , 2017, Knowl. Based Syst..

[20]  Hamid R. Rabiee,et al.  Active Learning from Positive and Unlabeled Data , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[21]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[22]  Junsong Yuan,et al.  Positive and Unlabeled Learning for Anomaly Detection with Multi-features , 2017, ACM Multimedia.

[23]  Yong Duan,et al.  Active Learning for Multivariate Time Series Classification with Positive Unlabeled Data , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).