Open Category Detection with PAC Guarantees

Open category detection is the problem of detecting "alien" test instances that belong to categories or classes that were not present in the training data. In many applications, reliably detecting such aliens is central to ensuring the safety and accuracy of test set predictions. Unfortunately, there are no algorithms that provide theoretical guarantees on their ability to detect aliens under general assumptions. Further, while there are algorithms for open category detection, there are few empirical results that directly report alien detection rates. Thus, there are significant theoretical and empirical gaps in our understanding of open category detection. In this paper, we take a step toward addressing this gap by studying a simple, but practically-relevant variant of open category detection. In our setting, we are provided with a "clean" training set that contains only the target categories of interest and an unlabeled "contaminated" training set that contains a fraction $\alpha$ of alien examples. Under the assumption that we know an upper bound on $\alpha$, we develop an algorithm with PAC-style guarantees on the alien detection rate, while aiming to minimize false alarms. Empirical results on synthetic and standard benchmark datasets demonstrate the regimes in which the algorithm can be effective and provide a baseline for further advancements.

[1]  Robert P. W. Duin,et al.  Growing a multi-class classifier with a reject option , 2008, Pattern Recognit. Lett..

[2]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[3]  Hakan Cevikalp,et al.  Efficient object detection using cascades of nearest convex model classifiers , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Terrance E. Boult,et al.  Probability Models for Open Set Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Hanqing Lu,et al.  Face detection using one-class-based support vectors , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[6]  Tomás Pevný,et al.  Loda: Lightweight on-line detector of anomalies , 2016, Machine Learning.

[7]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[8]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jieping Ye,et al.  A Small Sphere and Large Margin Approach for Novelty Detection Using Training Data with Outliers , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Yang Yu,et al.  Learning with Augmented Class by Exploiting Unlabeled Data , 2014, AAAI.

[11]  H. D. Brunk,et al.  The Isotonic Regression Problem and its Dual , 1972 .

[12]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[13]  Ricardo da Silva Torres,et al.  Nearest neighbors distance ratio open-set classifier , 2016, Machine Learning.

[14]  Thomas G. Dietterich,et al.  Automated processing and identification of benthic invertebrate samples , 2010, Journal of the North American Benthological Society.

[15]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[16]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[17]  Anderson Rocha,et al.  Toward Open Set Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  M. Wegkamp Lasso type classifiers with a reject option , 2007, 0705.2363.

[19]  Tadeusz Pietraszek,et al.  Optimizing abstaining classifiers using ROC analysis , 2005, ICML.

[20]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[21]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[22]  Lei Shu,et al.  DOC: Deep Open Classification of Text Documents , 2017, EMNLP.

[23]  Terrance E. Boult,et al.  Detecting and classifying scars, marks, and tattoos found in the wild , 2012, 2012 IEEE Fifth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[24]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[25]  Thomas G. Dietterich,et al.  Systematic construction of anomaly detection benchmarks from real data , 2013, ODD '13.