Cost-sensitive classifier chains: Selecting low-cost features in multi-label classification

Abstract Feature selection is one of the trending challenges in multi-label classification. In recent years a lot of methods have been proposed. However the existing approaches assume that all the features have the same cost. This assumption may be inappropriate when the acquisition of the feature values is costly. For example in medical diagnosis each diagnostic value extracted by a clinical test is associated with its own cost. In such cases it may be better to choose a model with an acceptable classification performance but a much lower cost. We propose a novel method which incorporates the feature cost information into the learning process. The method, named Cost-Sensitive Classifier Chains, combines classifier chains and penalized logistic regression with a modified elastic-net penalty which takes into account costs of the features. We prove the stability and provide a bound on generalization error of our algorithm. We also propose the adaptive version in which penalty factors are changing during fitting the consecutive models in the chain. The methods are applied on real datasets: MIMIC-II and Hepatitis for which the cost information is provided by experts. Moreover, we propose an experimental framework in which the features are observed with measurement errors and the costs depend on the quality of the features. The framework allows to compare the cost-sensitive methods on benchmark datasets for which the cost information is not provided. The proposed method can be recommended in a situation when one wants to balance low costs and high prediction performance.

[1]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[2]  C. S. George Lee,et al.  Weighted selection of image features for resolved rate visual feedback control , 1991, IEEE Trans. Robotics Autom..

[3]  Matt J. Kusner,et al.  Classifier cascades and trees for minimizing feature evaluation cost , 2014, J. Mach. Learn. Res..

[4]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[5]  Dae-Won Kim,et al.  SCLS: Multi-label feature selection based on scalable criterion for large label set , 2017, Pattern Recognit..

[6]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[7]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[8]  Dacheng Tao,et al.  CorrLog: Correlated Logistic Models for Joint Prediction of Multiple Labels , 2012, AISTATS.

[9]  Sebastián Ventura,et al.  A Tutorial on Multilabel Learning , 2015, ACM Comput. Surv..

[10]  Tao Li,et al.  Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features , 2016, Knowl. Based Syst..

[11]  Jason V. Davis,et al.  Cost-Sensitive Decision Tree Learning for Forensic Classification , 2006, ECML.

[12]  Verónica Bolón-Canedo,et al.  A framework for cost-based feature selection , 2014, Pattern Recognit..

[13]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[14]  T. H. Kyaw,et al.  Multiparameter Intelligent Monitoring in Intensive Care II: A public-access intensive care unit database* , 2011, Critical care medicine.

[15]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[16]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[17]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[18]  Hossein Nezamabadi-pour,et al.  Multilabel feature selection: A comprehensive review and guiding experiments , 2018, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[19]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[20]  Witold Pedrycz,et al.  Granular multi-label feature selection based on mutual information , 2017, Pattern Recognit..

[21]  Dae-Won Kim,et al.  Memetic feature selection algorithm for multi-label classification , 2015, Inf. Sci..

[22]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[23]  Yves Grandvalet,et al.  Optimizing F-Measures by Cost-Sensitive Classification , 2014, NIPS.

[24]  Qinghua Hu,et al.  Multi-label feature selection with missing labels , 2018, Pattern Recognit..

[25]  Yuhua Qian,et al.  Test-cost-sensitive attribute reduction , 2011, Inf. Sci..

[26]  Newton Spolaôr,et al.  A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach , 2013, CLEI Selected Papers.

[27]  Marlon Núñez,et al.  The Use of Background Knowledge in Decision Tree Induction , 1991, Machine Learning.

[28]  Pawel Teisseyre,et al.  CCnet: Joint multi-label classification and feature selection using classifier chains and elastic net regularization , 2017, Neurocomputing.

[29]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[30]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[31]  Lawrence Carin,et al.  Cost-sensitive feature acquisition and classification , 2007, Pattern Recognit..

[32]  Scott Sanner,et al.  Cost-Sensitive Parsimonious Linear Regression , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[33]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[34]  Rolf Ingold,et al.  Performance comparison of multi-label learning algorithms on clinical data for chronic diseases , 2015, Comput. Biol. Medicine.

[35]  Dae-Won Kim,et al.  Feature selection for multi-label classification using multivariate mutual information , 2013, Pattern Recognit. Lett..

[36]  Tapio Salakoski,et al.  Multi-label learning under feature extraction budgets , 2014, Pattern Recognit. Lett..

[37]  George Miller,et al.  National health spending by medical condition, 1996-2005. , 2009, Health affairs.

[38]  P. Kostense,et al.  How to save costs by reducing unnecessary testing: lean thinking in clinical practice. , 2012, European journal of internal medicine.

[39]  Qinghua Hu,et al.  Feature selection with test cost constraint , 2012, ArXiv.

[40]  Bianca Zadrozny,et al.  Categorizing feature selection methods for multi-label classification , 2016, Artificial Intelligence Review.

[41]  Dae-Won Kim,et al.  Fast multi-label feature selection based on information-theoretic feature ranking , 2015, Pattern Recognit..

[42]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[43]  Chun-Liang Li,et al.  Condensed Filter Tree for Cost-Sensitive Multi-Label Classification , 2014, ICML.

[44]  Michel Verleysen,et al.  Mutual information-based feature selection for multilabel classification , 2013, Neurocomputing.

[45]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[46]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[47]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..