论文信息 - Robust Bloom Filters for Large MultiLabel Classification Tasks

Robust Bloom Filters for Large MultiLabel Classification Tasks

This paper presents an approach to multilabel classification (MLC) with a large number of labels. Our approach is a reduction to binary classification in which label sets are represented by low dimensional binary vectors. This representation follows the principle of Bloom filters, a space-efficient data structure originally designed for approximate membership testing. We show that a naive application of Bloom filters in MLC is not robust to individual binary classifiers' errors. We then present an approach that exploits a specific feature of real-world datasets when the number of labels is large: many labels (almost) never appear together. Our approach is provably robust, has sublinear training and inference complexity with respect to the number of labels, and compares favorably to state-of-the-art algorithms on two large scale multilabel datasets.

[1] Eyke Hüllermeier,et al. Combining instance-based learning and logistic regression for multilabel classification , 2009, Machine Learning.

[2] Eyke Hüllermeier,et al. Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[3] Eyke Hüllermeier,et al. On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[4] Larry Carter,et al. Exact and approximate membership testers , 1978, STOC.

[5] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[6] Lihi Zelnik-Manor,et al. Large Scale Max-Margin Multi-Label Classification with Priors , 2010, ICML.

[7] Hsuan-Tien Lin,et al. Multilabel Classification with Principal Label Space Transformation , 2012, Neural Computation.

[8] Ohad Shamir,et al. Multiclass-Multilabel Classification with More Classes than Examples , 2010, AISTATS.

[9] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[10] Hsuan-Tien Lin,et al. Feature-aware Label Space Dimension Reduction for Multi-label Classification , 2012, NIPS.

[11] Jean-Loup Guillaume,et al. Fast unfolding of communities in large networks , 2008, 0803.0476.

[12] Kenneth J. Christensen,et al. A new analysis of the false positive rate of a Bloom filter , 2010, Inf. Process. Lett..

[13] Geoff Holmes,et al. Classifier chains for multi-label classification , 2009, Machine Learning.

[14] John Langford,et al. Multi-Label Prediction via Compressed Sensing , 2009, NIPS.