Semisupervised Prior Free Rare Category Detection With Mixed Criteria

Rare category detection aims to find interesting and statistically significant anomalies and incorporates ideas from active learning and semisupervised learning. The challenge of rare category detection is to find the rare classes of the anomalies in a data set where the data distribution is skewed. Most existing rare category detection methods suppose that the user knows the specific number of all classes in advance, which cannot be satisfied in most real scenarios. In this paper, we propose a new rare category detection framework composed of active learning and semisupervised hierarchical density-based clustering. The advantage of our method is that it is prior free and can benefit the rare category detecting process with the labeled data. In addition, the proposed framework can handle tasks with nonlinear mappings, which increases the ability to find rare classes when the class boundary is sophisticated. Compared to existing methods, better results are achieved by our method on both real and synthetic data sets in the experiment.

[1]  Hao Huang,et al.  CLOVER: a faster prior-free approach to rare-category detection , 2012, Knowledge and Information Systems.

[2]  S. V. N. Vishwanathan,et al.  Fast Iterative Kernel Principal Component Analysis , 2007, J. Mach. Learn. Res..

[3]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[4]  Meng Liu,et al.  Efficient Mean‐shift Clustering Using Gaussian KD‐Tree , 2010, Comput. Graph. Forum.

[5]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[6]  Mahdieh Soleymani Baghshah,et al.  Semi-Supervised Metric Learning Using Pairwise Constraints , 2009, IJCAI.

[7]  Andreas Nürnberger,et al.  Learning a Metric during Hierarchical Clustering based on Constraints , 2009, LWA.

[8]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[9]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[10]  Inderjit S. Dhillon,et al.  Memory Efficient Kernel Approximation , 2014, ICML.

[11]  Stephen J. Roberts,et al.  Parametric and non-parametric unsupervised cluster analysis , 1997, Pattern Recognit..

[12]  Antonio Irpino,et al.  Dimension Reduction Techniques for Distributional Symbolic Data , 2013, IEEE Transactions on Cybernetics.

[13]  Tamás Linder,et al.  On some convergence properties of the subspace constrained mean shift , 2013, Pattern Recognit..

[14]  Shaogang Gong,et al.  Reidentification by Relative Distance Comparison , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Nasser Kehtarnavaz,et al.  Determining number of clusters and prototype locations via multi-scale clustering , 1998, Pattern Recognit. Lett..

[16]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[17]  Yee Leung,et al.  Clustering by Scale-Space Filtering , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[19]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Andrew W. Moore,et al.  Active Learning for Anomaly and Rare-Category Detection , 2004, NIPS.

[21]  Ping He,et al.  Semi-supervised clustering via multi-level random walk , 2014, Pattern Recognit..

[22]  Jingrui He,et al.  Graph-Based Rare Category Detection , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  Peter Meer,et al.  Semi-Supervised Kernel Mean Shift Clustering , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Boonserm Kijsirikul,et al.  A new kernelization framework for Mahalanobis distance learning algorithms , 2010, Neurocomputing.

[25]  Inderjit S. Dhillon,et al.  Metric and Kernel Learning Using a Linear Transformation , 2009, J. Mach. Learn. Res..

[26]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[27]  Sam Kwong,et al.  Active learning with multi-criteria decision making systems , 2014, Pattern Recognit..

[28]  Kuo-Lung Wu,et al.  Mean shift-based clustering , 2007, Pattern Recognit..

[29]  Jingrui He,et al.  Nearest-Neighbor-Based Active Learning for Rare Category Detection , 2007, NIPS.

[30]  Yihao Zhang,et al.  Semi-supervised learning combining co-training with active learning , 2014, Expert Syst. Appl..

[31]  Junjie Wu,et al.  Spectral Ensemble Clustering , 2015, KDD.

[32]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[33]  Qi Wang,et al.  Online Anomaly Detection in Crowd Scenes via Structure Analysis , 2015, IEEE Transactions on Cybernetics.

[34]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[35]  Weng-Keen Wong,et al.  Category detection using hierarchical mean shift , 2009, KDD.

[36]  Hao Huang,et al.  RADAR: Rare Category Detection via Computation of Boundary Degree , 2011, PAKDD.

[37]  Arthur Zimek,et al.  Ensembles for unsupervised outlier detection: challenges and research questions a position paper , 2014, SKDD.

[38]  Jingrui He,et al.  Prior-Free Rare Category Detection , 2009, SDM.

[39]  Chin-Chun Chang,et al.  A boosting approach for supervised Mahalanobis distance metric learning , 2012, Pattern Recognit..

[40]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[41]  Tao Xiang,et al.  Finding Rare Classes: Active Learning with Generative and Discriminative Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[42]  Hao Huang,et al.  Rare Category Detection on O(dN) Time Complexity , 2014, PAKDD.

[43]  Y. Rui,et al.  Learning to Rank Using User Clicks and Visual Features for Image Retrieval , 2015, IEEE Transactions on Cybernetics.

[44]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.