Multi-Dimensional Classification with Super-Classes

The multi-dimensional classification problem is a generalization of the recently-popularized task of multi-label classification, where each data instance is associated with multiple class variables. There has been relatively little research carried out specific to multi-dimensional classification and, although one of the core goals is similar (modeling dependencies among classes), there are important differences; namely a higher number of possible classifications. In this paper we present method for multi-dimensional classification, drawing from the most relevant multi-label research, and combining it with important novel developments. Using a fast method to model the conditional dependence between class variables, we form super-class partitions and use them to build multi-dimensional learners, learning each super-class as an ordinary class, and thus explicitly modeling class dependencies. Additionally, we present a mechanism to deal with the many class values inherent to super-classes, and thus make learning efficient. To investigate the effectiveness of this approach we carry out an empirical evaluation on a range of multi-dimensional datasets, under different evaluation metrics, and in comparison with high-performing existing multi-dimensional approaches from the literature. Analysis of results shows that our approach offers important performance gains over competing methods, while also exhibiting tractable running time.

[1]  Zhi-Hua Zhou,et al.  Multi-Label Learning by Exploiting Label Correlations Locally , 2012, AAAI.

[2]  K. Dembczynski,et al.  On Label Dependence in Multi-Label Classification , 2010 .

[3]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Eyke Hüllermeier,et al.  On label dependence in multilabel classification , 2010, ICML 2010.

[6]  Concha Bielza,et al.  Bayesian Chain Classifiers for Multidimensional Classification , 2011, IJCAI.

[7]  Saso Dzeroski,et al.  Hierarchical Multi-classification with Predictive Clustering Trees in Functional Genomics , 2005, EPIA.

[8]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[9]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[10]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[11]  Bernard Zenko,et al.  Learning Classification Rules for Multiple Target Attributes , 2008, PAKDD.

[12]  Saso Dzeroski,et al.  Ensembles of Multi-Objective Decision Trees , 2007, ECML.

[13]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[14]  Concha Bielza,et al.  Multi-dimensional classification with Bayesian networks , 2011, Int. J. Approx. Reason..

[15]  Concha Bielza,et al.  Markov blanket-based approach for learning multi-dimensional Bayesian network classifiers: An application to predict the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson's Disease Questionnaire (PDQ-39) , 2012, J. Biomed. Informatics.

[16]  Jesse Read,et al.  Scalable Multi-label Classification , 2010 .

[17]  Eyke Hüllermeier,et al.  Combining Instance-Based Learning and Logistic Regression for Multilabel Classification , 2009, ECML/PKDD.

[18]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[21]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[22]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[23]  Pablo Hernandez-Leal,et al.  Hybrid Binary-Chain Multi-label Classifiers , 2012 .

[24]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[25]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..