论文信息 - Nearest-Neighbor-Based Active Learning for Rare Category Detection

Nearest-Neighbor-Based Active Learning for Rare Category Detection

Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining - e.g. detecting new financial transaction fraud patterns, where normal legitimate transactions dominate. This paper develops a new method for detecting an instance of each minority class via an unsupervised local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor process is used to optimize the probability of sampling tightly-grouped minority classes, subject to a local smoothness assumption of the majority class. Results on both synthetic and real data sets are very positive, detecting each minority class with only a fraction of the actively sampled points required by random sampling and by Pelleg's Interleave method, the prior best technique in the sparse literature on this topic.

Jingrui He | Jaime G. Carbonell | J. Carbonell | Jingrui He

[1] Andrew W. Moore,et al. Active Learning for Anomaly and Rare-Category Detection , 2004, NIPS.

[2] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[3] Yishay Mansour,et al. Active Sampling for Multiple Output Identification , 2006, COLT.

[4] Stephen D. Bay,et al. Large Scale Detection of Irregularities in Accounting Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[5] John Langford,et al. Agnostic active learning , 2006, J. Comput. Syst. Sci..

[6] Sanjoy Dasgupta,et al. Coarse sample complexity bounds for active learning , 2005, NIPS.