In many real-world scenarios, we must make judgments in the presence of computational constraints. One common computational constraint arises when the features used to make a judgment each have differing acquisition costs, but there is a fixed total budget for a set of judgments. Particularly when there are a large number of classifications that must be made in a real-time, an intelligent strategy for optimizing accuracy versus computational costs is essential. E-mail classification is an area where accurate and timely results require such a trade-off. We identify two scenarios where intelligent feature acquisition can improve classifier performance. In granular classification we seek to classify e-mails with increasingly specific labels structured in a hierarchy, where each level of the hierarchy requires a different trade-off between cost and accuracy. In load-sensitive classification, we classify a set of instances within an arbitrary total budget for acquiring features. Our method, Adaptive Classifier Cascades (ACC), designs a policy to combine a series of base classifiers with increasing computational costs given a desired trade-off between cost and accuracy. Using this method, we learn a relationship between feature costs and label hierarchies, for granular classification and cost budgets, for load-sensitive classification. We evaluate our method on real-world e-mail datasets with realistic estimates of feature acquisition cost, and we demonstrate superior results when compared to baseline classifiers that do not have a granular, cost-sensitive feature acquisition policy.
[1]
Jonathan B. Postel.
Rfc821: simple mail transfer protocol
,
1982
.
[2]
Peter D. Turney.
Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm
,
1994,
J. Artif. Intell. Res..
[3]
Daphne Koller,et al.
Hierarchically Classifying Documents Using Very Few Words
,
1997,
ICML.
[4]
Julia Itskevitch.
AUTOMATIC HIERARCHICAL E-MAIL CLASSIFICATION USING ASSOCIATION RULES
,
2001
.
[5]
Paul A. Viola,et al.
Rapid object detection using a boosted cascade of simple features
,
2001,
Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.
[6]
Qiang Yang,et al.
Test-cost sensitive naive Bayes classification
,
2004,
Fourth IEEE International Conference on Data Mining (ICDM'04).
[7]
Foster J. Provost,et al.
Active feature-value acquisition for classifier induction
,
2004,
Fourth IEEE International Conference on Data Mining (ICDM'04).
[8]
Gordon V. Cormack,et al.
TREC 2006 Spam Track Overview
,
2006,
TREC.
[9]
Hal Daumé.
Notes on CG and LM-BFGS Optimization of Logistic Regression
,
2008
.
[10]
Shipeng Yu,et al.
Designing efficient cascaded classifiers: tradeoff between accuracy and cost
,
2010,
KDD.
[11]
Nuno Vasconcelos,et al.
Boosting Classifier Cascades
,
2010,
NIPS.