Pitfalls of Assessing Extracted Hierarchies for Multi-Class Classification

Using hierarchies of classes is one of the standard methods to solve multi-class classification problems. In the literature, selecting the right hierarchy is considered to play a key role in improving classification performance. Although different methods have been proposed, there is still a lack of understanding of what makes one method to extract hierarchies perform better or worse. To this effect, we analyze and compare some of the most popular approaches to extracting hierarchies. We identify some common pitfalls that may lead practitioners to make misleading conclusions about their methods. In addition, to address some of these problems, we demonstrate that using random hierarchies is an appropriate benchmark to assess how the hierarchy’s quality affects the classification performance. In particular, we show how the hierarchy’s quality can become irrelevant depending on the experimental setup: when using powerful enough classifiers, the final performance is not affected by the quality of the hierarchy. We also show how comparing the effect of the hierarchies against non-hierarchical approaches might incorrectly indicate their superiority. Our results confirm that datasets with a high number of classes generally present complex structures in how these classes relate to each other. In these datasets, the right hierarchy can dramatically improve classification performance.

[1]  David G. Kleinbaum,et al.  Polytomous Logistic Regression , 2010 .

[2]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[3]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[4]  Ioannis Partalas,et al.  Learning Taxonomy Adaptation in Large-scale Classification , 2016, J. Mach. Learn. Res..

[5]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[6]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[7]  Eyke Hüllermeier,et al.  On the effectiveness of heuristics for learning nested dichotomies: an empirical analysis , 2018, Machine Learning.

[8]  Cèsar Ferri,et al.  Probabilistic class hierarchies for multiclass classification , 2018, J. Comput. Sci..

[9]  Joydeep Ghosh,et al.  Integrating support vector machines in a hierarchical output space decomposition framework , 2004, IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium.

[10]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[11]  Francisco Herrera,et al.  An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes , 2011, Pattern Recognit..

[12]  Ioannis Partalas,et al.  On Flat versus Hierarchical Classification in Large-Scale Taxonomies , 2013, NIPS.

[13]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[14]  Shagan Sah,et al.  Hierarchical Decomposition of Large Deep Networks , 2016, Computational Imaging.

[15]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[16]  Jennifer G. Dy,et al.  A hierarchical method for multi-class support vector machines , 2004, ICML.

[17]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[18]  Charalambos A. Charalambides,et al.  Enumerative combinatorics , 2018, SIGA.

[19]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[20]  Shantanu Godbole Exploiting confusion matrices for automatic generation of topic hierarchies and scaling up multi-way classifiers , 2002 .

[21]  Azad Naik,et al.  Improving large-scale hierarchical classification by rewiring: a data-driven filter based approach , 2018, Journal of Intelligent Information Systems.

[22]  Li Lin,et al.  Joint Hierarchical Category Structure Learning and Large-Scale Image Classification , 2017, IEEE Transactions on Image Processing.

[23]  Weiwei Liu,et al.  An Easy-to-hard Learning Paradigm for Multiple Classes and Multiple Labels , 2017, J. Mach. Learn. Res..

[24]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[25]  Grigorios Tsoumakas,et al.  Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[26]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[27]  Stefan Kramer,et al.  Ensembles of nested dichotomies for multi-class problems , 2004, ICML.

[28]  Neha Mehra,et al.  Survey on Multiclass Classification Methods , 2013 .

[29]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[30]  Ethem Alpaydin,et al.  Calculating the VC-dimension of decision trees , 2009, 2009 24th International Symposium on Computer and Information Sciences.