Rare Category Detection Forest

Rare category detecion RCD aims to discover rare categories in a massive unlabeled data set with the help of a labeling oracle. A challenging task in RCD is to discover rare categories which are concealed by numerous data examples from major categories. Only a few algorithms have been proposed for this issue, most of which are on quadratic or cubic time complexity. In this paper, we propose a novel tree-based algorithm known as RCD-Forest with $$O\varphi n \log {n/s}$$ time complexity and high query efficiency where n is the size of the unlabeled data set. Experimental results on both synthetic and real data sets verify the effectiveness and efficiency of our method.

[1]  Hao Huang,et al.  Rare Category Detection on O(dN) Time Complexity , 2014, PAKDD.

[2]  Muhammad Aamir Cheema,et al.  Database Systems for Advanced Applications , 2015, Lecture Notes in Computer Science.

[3]  Jingrui He,et al.  Nearest-Neighbor-Based Active Learning for Rare Category Detection , 2007, NIPS.

[4]  Parikshit Ram,et al.  Density estimation trees , 2011, KDD.

[5]  Hao Huang,et al.  CLOVER: a faster prior-free approach to rare-category detection , 2012, Knowledge and Information Systems.

[6]  Yunjun Gao,et al.  Rare category exploration , 2014, Expert Syst. Appl..

[7]  Yunjun Gao,et al.  Rare Category Exploration on Linear Time Complexity , 2015, DASFAA.

[8]  Jingrui He,et al.  Graph-Based Rare Category Detection , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Hao Huang,et al.  RADAR: Rare Category Detection via Computation of Boundary Degree , 2011, PAKDD.