Multi-label classification search space in the MEKA software

This supplementary material aims to describe the proposed multi-label classification (MLC) search spaces based on the MEKA and WEKA softwares. First, we overview 26 MLC algorithms and meta-algorithms in MEKA, presenting their main characteristics, such as hyper-parameters, dependencies and constraints. Second, we review 28 single-label classification (SLC) algorithms, preprocessing algorithms and meta-algorithms in the WEKA software. These SLC algorithms were also studied because they are part of the proposed MLC search spaces. Fundamentally, this occurs due to the problem transformation nature of several MLC algorithms used in this work. These algorithms transform an MLC problem into one or several SLC problems in the first place and solve them with SLC model(s) in a next step. Therefore, understanding their main characteristics is crucial to this work. Finally, we present a formal description of the search spaces by proposing a context-free grammar that encompasses the 54 learning algorithms. This grammar basically comprehends the possible combinations, the constraints and dependencies among the learning algorithms.

[1]  Luca Martino,et al.  Scalable multi-output label prediction: From classifier chains to classifier trellises , 2015, Pattern Recognit..

[2]  Fernando Pérez-Cruz,et al.  Deep Learning for Multi-label Classification , 2014, ArXiv.

[3]  Luca Martino,et al.  Efficient monte carlo methods for multi-dimensional learning with classifier chains , 2012, Pattern Recognit..

[4]  Luca Martino,et al.  Efficient monte carlo optimization for multi-label classifier chains , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[6]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[7]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[8]  Stefan Kramer,et al.  Multi-label classification using boolean matrix decomposition , 2012, SAC '12.

[9]  Yuhong Guo,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Multi-Label Classification Using Conditional Dependency Networks , 2022 .

[10]  Concha Bielza,et al.  Bayesian Chain Classifiers for Multidimensional Classification , 2011, IJCAI.

[11]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[14]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[15]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[16]  Min-Ling Zhang,et al.  Multi-Label Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Trans. Knowl. Data Eng..

[17]  Eibe Frank,et al.  Naive Bayes for Text Classification with Unbalanced Classes , 2006, PKDD.

[18]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[19]  L. Buydens,et al.  Facilitating the application of Support Vector Regression by using a universal Pearson VII function based kernel , 2006 .

[20]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[21]  Eibe Frank,et al.  Speeding Up Logistic Model Tree Induction , 2005, PKDD.

[22]  Remco R. Bouckaert,et al.  Bayesian network classifiers in Weka , 2004 .

[23]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[24]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[25]  Bernhard Pfahringer,et al.  Locally Weighted Naive Bayes , 2002, UAI.

[26]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[27]  L. Breiman Random Forests , 2001, Machine Learning.

[28]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[29]  Advances in Kernel Methods , 1998 .

[30]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[32]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[33]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[34]  Carla E. Brodley,et al.  Pruning Decision Trees with Misclassification Costs , 1998, ECML.

[35]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[36]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[37]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[38]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[39]  Josef Kittler,et al.  Combining classifiers , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[40]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[41]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[42]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[43]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[44]  R. Bouckaert Bayesian belief networks : from construction to inference , 1995 .

[45]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[46]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[47]  S. Salzberg C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[48]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[49]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[50]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[51]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[52]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[53]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[54]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[55]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[56]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[57]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[58]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[59]  Geoff Holmes,et al.  MEKA: A Multi-label/Multi-target Extension to WEKA , 2016, J. Mach. Learn. Res..

[60]  อนิรุธ สืบสิงห์ Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[61]  Mehrdad Jalali,et al.  Structure Learning of Bayesian Networks Using Heuristic Methods , 2012 .

[62]  W. Marsden I and J , 2012 .

[63]  Jesse Read,et al.  Scalable Multi-label Classification , 2010 .

[64]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[65]  Madjid Fathi,et al.  Competing Fusion for Bayesian Applications , 2008 .

[66]  Jesse Read,et al.  A Pruned Problem Transformation Method for Multi-label Classification , 2008 .

[67]  Grigorios Tsoumakas,et al.  Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[68]  Chih-Jen Lin,et al.  A Study on Threshold Selection for Multi-label Classification , 2007 .

[69]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[70]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[71]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[72]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[73]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[74]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .