Boosting Positive and Unlabeled Learning for Anomaly Detection With Multi-Features

One of the key challenges of machine learning-based anomaly detection relies on the difficulty of obtaining anomaly data for training, which is usually rare, diversely distributed, and difficult to collect. To address this challenge, we formulate anomaly detection as a Positive and Unlabeled (PU) learning problem where only labeled positive (normal) data and unlabeled (normal and anomaly) data are required for learning an anomaly detector. As a semi-supervised learning method, it does not require providing labeled anomaly data for the training, thus it is easily deployed to various applications. As the unlabeled data can be extremely unbalanced, we introduce a novel PU learning method, which can tackle the situation where an unlabeled data set is mostly composed of positive instances. We start by using a linear model to extract the most reliable negative instances followed by a self-learning process to add reliable negative and positive instances with different speeds based on the estimated positive class prior. Furthermore, when feedback is available, we adopt boosting in the self-learning process to advantageously exploit the instability characteristic of PU learning. The classifiers in the self-learning process are weighted combined based on the estimated error rate to build the final classifier. Extensive experiments on six real datasets and one synthetic dataset show that our methods have better results under different conditions compared to existing methods.

[1]  Bing Li,et al.  Multi-Perspective Cost-Sensitive Context-Aware Multi-Instance Sparse Coding and Its Application to Sensitive Video Recognition , 2016, IEEE Transactions on Multimedia.

[2]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[3]  Nikos Fakotakis,et al.  Probabilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions , 2011, IEEE Transactions on Multimedia.

[4]  Arjun Mukherjee,et al.  Spotting Fake Reviews using Positive-Unlabeled Learning , 2014, Computación y Sistemas.

[5]  Jian Yang,et al.  Instance Selection and Instance Weighting for Cross-Domain Sentiment Classification via PU Learning , 2013, IJCAI.

[6]  Chengqi Zhang,et al.  Similarity-Based Approach for Positive and Unlabeled Learning , 2011, IJCAI.

[7]  Jungsuk Song,et al.  Cooperation of Intelligent Honeypots to Detect Unknown Malicious Codes , 2008, 2008 WOMBAT Workshop on Information Security Threats Data Collection and Sharing.

[8]  Marimuthu Palaniswami,et al.  Anomalous Behavior Detection in Crowded Scenes Using Clustering and Spatio-Temporal Features , 2016, Intelligent Information Processing.

[9]  Hong-Yuan Mark Liao,et al.  Automatic Training Image Acquisition and Effective Feature Selection From Community-Contributed Photos for Facial Attribute Detection , 2013, IEEE Transactions on Multimedia.

[10]  Paolo Rosso,et al.  Detecting positive and negative deceptive opinions using PU-learning , 2015, Inf. Process. Manag..

[11]  Changick Kim,et al.  Face and Hair Region Labeling Using Semi-Supervised Spectral Clustering-Based Multiple Segmentations , 2016, IEEE Transactions on Multimedia.

[12]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Wei Li,et al.  Transferred Deep Learning for Anomaly Detection in Hyperspectral Imagery , 2017, IEEE Geoscience and Remote Sensing Letters.

[16]  Xiaoli Li,et al.  Ensemble Positive Unlabeled Learning for Disease Gene Identification , 2014, PloS one.

[17]  Shin Ando,et al.  Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[18]  Aritra Ghosh,et al.  Anomaly detection in web graphs using vertex neighbourhood based signature similarity methods , 2016, 2016 International Conference on Data Science and Engineering (ICDSE).

[19]  Dong-Hong Ji,et al.  Positive Unlabeled Learning for Deceptive Reviews Detection , 2014, EMNLP.

[20]  Mohan S. Kankanhalli,et al.  Adaptive Workload Equalization in Multi-Camera Surveillance Systems , 2012, IEEE Transactions on Multimedia.

[21]  Meng Jian,et al.  Semi-Supervised Bi-Dictionary Learning for Image Classification With Smooth Representation-Based Label Propagation , 2016, IEEE Transactions on Multimedia.

[22]  Qi Tian,et al.  Generalized Semi-supervised and Structured Subspace Learning for Cross-Modal Retrieval , 2018, IEEE Transactions on Multimedia.

[23]  Chee Keong Kwoh,et al.  Positive-unlabeled learning for disease gene identification , 2012, Bioinform..

[24]  Gang Niu,et al.  Analysis of Learning from Positive and Unlabeled Data , 2014, NIPS.

[25]  Wei Shen,et al.  Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes , 2016, Signal Process. Image Commun..

[26]  Carl K. Chang,et al.  Bayesian Model Averaging of Bayesian Network Classifiers for Intrusion Detection , 2014, 2014 IEEE 38th International Computer Software and Applications Conference Workshops.

[27]  Jean-Philippe Vert,et al.  ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples , 2011, BMC Bioinformatics.

[28]  Arindam Banerjee,et al.  Anomaly Detection in Transportation Corridors using Manifold Embedding , 2007 .

[29]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[30]  Zhiwen Yu,et al.  Semi-Supervised Image Classification With Self-Paced Cross-Task Networks , 2018, IEEE Transactions on Multimedia.

[31]  Yizhou Sun,et al.  Entity Embedding-Based Anomaly Detection for Heterogeneous Categorical Events , 2016, IJCAI.

[32]  Junsong Yuan,et al.  Positive and Unlabeled Learning for Anomaly Detection with Multi-features , 2017, ACM Multimedia.

[33]  Masashi Sugiyama,et al.  Class Prior Estimation from Positive and Unlabeled Data , 2014, IEICE Trans. Inf. Syst..

[34]  Jean-Philippe Vert,et al.  A bagging SVM to learn from positive and unlabeled examples , 2010, Pattern Recognit. Lett..

[35]  Cesare Alippi,et al.  An HMM-based change detection method for intelligent embedded sensors , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[36]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[37]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.