Learning rules for multi-label classification: a stacking and a separate-and-conquer approach

Dependencies between the labels are commonly regarded as the crucial issue in multi-label classification. Rules provide a natural way for symbolically describing such relationships. For instance, rules with label tests in the body allow for representing directed dependencies like implications, subsumptions, or exclusions. Moreover, rules naturally allow to jointly capture both local and global label dependencies. In this paper, we introduce two approaches for learning such label-dependent rules. Our first solution is a bootstrapped stacking approach which can be built on top of a conventional rule learning algorithm. For this, we learn for each label a separate ruleset, but we include the remaining labels as additional attributes in the training instances. The second approach goes one step further by adapting the commonly used separate-and-conquer algorithm for learning multi-label rules. The main idea is to re-include the covered examples with the predicted labels so that this information can be used for learning subsequent rules. Both approaches allow for making label dependencies explicit in the rules. In addition, the usage of standard rule learning techniques targeted at producing accurate predictions ensures that the found rules are useful for actual classification. Our experiments show (a) that the discovered dependencies contribute to the understanding and improve the analysis of multi-label datasets, and (b) that the found multi-label rules are crucial for the predictive performance as our proposed approaches beat the baseline using conventional rules.

[1]  Johannes Fürnkranz,et al.  Incremental Reduced Error Pruning , 1994, ICML.

[2]  Mohammed J. Zaki,et al.  Multi-label Lazy Associative Classification , 2007, PKDD.

[3]  Johannes Fürnkranz,et al.  Separating Rule Refinement and Rule Selection Heuristics in Inductive Rule Learning , 2014, ECML/PKDD.

[4]  Luca Martino,et al.  Efficient monte carlo methods for multi-dimensional learning with classifier chains , 2012, Pattern Recognit..

[5]  Sebastián Ventura,et al.  Evolving Multi-label Classification Rules with Gene Expression Programming: A Preliminary Study , 2010, HAIS.

[6]  Tapio Elomaa,et al.  Multi-target regression with rule ensembles , 2012, J. Mach. Learn. Res..

[7]  Johannes Fürnkranz,et al.  Multi-Label Classification with Label Constraints , 2008 .

[8]  Timo Aho,et al.  Rule Ensembles for Multi-target Regression , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[9]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[10]  Grigorios Tsoumakas,et al.  Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning , 2015, KDD.

[11]  Eneldo Loza Mencía,et al.  Stacking Label Features for Learning Multilabel Rules , 2014, Discovery Science.

[12]  Lior Rokach,et al.  Exploiting label dependencies for improved sample complexity , 2013, Machine Learning.

[13]  Charles Elkan,et al.  Learning and Inference in Probabilistic Classifier Chains with Beam Search , 2012, ECML/PKDD.

[14]  Johannes Fürnkranz,et al.  The SeCo-framework for rule learning , 2010 .

[15]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[16]  Concha Bielza,et al.  Multi-label classification with Bayesian network-based chain classifiers , 2014, Pattern Recognit. Lett..

[17]  Dejan Gjorgjevikj,et al.  Efficient Two Stage Voting Architecture for Pairwise Multi-label Classification , 2010, Australasian Conference on Artificial Intelligence.

[18]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[19]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[20]  Denis Deratani Mauá,et al.  An Ensemble of Bayesian Networks for Multilabel Classification , 2013, IJCAI.

[21]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[22]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[23]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[24]  Johannes Fürnkranz,et al.  On the quest for optimal rule learning heuristics , 2010, Machine Learning.

[25]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[26]  Marcel Abendroth,et al.  Data Mining Practical Machine Learning Tools And Techniques With Java Implementations , 2016 .

[27]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[28]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[29]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[30]  Bo Li,et al.  Multi-label Classification based on Association Rules with Application to Scene Classification , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[31]  Min-Ling Zhang,et al.  Enhancing Binary Relevance for Multi-label Learning with Controlled Label Correlations Exploitation , 2014, PRICAI.

[32]  Peter I. Cowling,et al.  MMAC: a new multi-class, multi-label associative classification approach , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[33]  Grigorios Tsoumakas,et al.  Correlation-Based Pruning of Stacked Binary Relevance Models for Multi-Label Learning , 2009 .

[34]  Donato Malerba,et al.  A Multistrategy Approach to Learning Multiple Dependent Concepts , 1996 .

[35]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[36]  Zhi-Hua Zhou,et al.  Selective Ensemble of Classifier Chains , 2013, MCS.

[37]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[38]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[39]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[40]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.

[41]  Eyke Hüllermeier,et al.  Combining instance-based learning and logistic regression for multilabel classification , 2009, Machine Learning.

[42]  Eyke Hüllermeier,et al.  Rectifying Classifier Chains for Multi-Label Classification , 2019, LWA.

[43]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[44]  Saso Dzeroski,et al.  Two stage architecture for multi-label learning , 2012, Pattern Recognit..

[45]  Pericles A. Mitkas,et al.  Effective Rule-Based Multi-label Classification with Learning Classifier Systems , 2013, ICANNGA.

[46]  Bernard Zenko,et al.  Learning Classification Rules for Multiple Target Attributes , 2008, PAKDD.

[47]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[48]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[49]  Yihong Gong,et al.  Multi-labelled classification using maximum entropy method , 2005, SIGIR '05.

[50]  Yuhong Guo,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Multi-Label Classification Using Conditional Dependency Networks , 2022 .

[51]  Eyke Hüllermeier,et al.  Dependent binary relevance models for multi-label classification , 2014, Pattern Recognit..

[52]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[53]  Sebastián Ventura,et al.  Multi‐label learning: a review of the state of the art and ongoing research , 2014, WIREs Data Mining Knowl. Discov..

[54]  Yang Yu,et al.  Multi-label hypothesis reuse , 2012, KDD.

[55]  Stanley C. Fralick,et al.  Learning to recognize patterns without a teacher , 1967, IEEE Trans. Inf. Theory.