Improving Performance of Multiclass Classification by Inducing Class Hierarchies

Abstract In the last decades, one issue that has received a lot of attention in classification problems is how to obtain better classifications. This problem becomes even more complicated when the number of classes is high. In this multiclass scenario, it is assumed that the class labels are independent of each other, and thus, most techniques and methods proposed to improve the performance of the classifiers rely on it. An alternative way to address the multiclass problem is to hierarchically distribute the classes in a collection of multiclass subproblems by reducing the number of classes involved in each local subproblem. In this paper, we propose a new method for inducing a class hierarchy from the confusion matrix of a multiclass classifier. Then, we use the class hierarchy to learn a tree-like hierarchy of classifiers for solving the original multiclass problem in a similar way as the top-down hierarchical classification approach does for working with hierarchical domains. We experimentally evaluate the proposal on a collection of multiclass datasets showing that, in general, the generated hierarchies not only outperforms the original (flat) classification but also hierarchical approaches based on other alternative ways of constructing the class hierarchy.

[1]  Steffen Staab,et al.  Learning Taxonomic Relations from Heterogeneous Sources of Evidence , 2005 .

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[4]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[6]  Peter A. Flach,et al.  A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss C` Esar Ferri , 2012 .

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[8]  Gail A. Carpenter,et al.  Self-organizing information fusion and hierarchical knowledge discovery: a new framework using ARTMAP neural networks , 2005, Neural Networks.

[9]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[10]  Fernando Benites de Azevedo e Souza,et al.  Learning Different Concept Hierarchies and the Relations between them from Classified Data , 2012 .

[11]  Fred Galvin,et al.  Distance functions and topologies , 1991 .

[12]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[13]  Sunita Sarawagi,et al.  Scaling multi-class support vector machines using inter-class confusion , 2002, KDD.

[14]  Boi Faltings,et al.  Using hierarchical clustering for learning theontologies used in recommendation systems , 2007, KDD '07.

[15]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[16]  Fernando Benites,et al.  Multi-label classification and extracting predicted class hierarchies , 2011, Pattern Recognit..

[17]  Narendra Ahuja,et al.  Unsupervised multidimensional hierarchical clustering , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[19]  Ivan Bratko,et al.  Learning by Discovering Concept Hierarchies , 1999, Artif. Intell..

[20]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.