A survey of hierarchical classification across different application domains

In this survey we discuss the task of hierarchical classification. The literature about this field is scattered across very different application domains and for that reason research in one domain is often done unaware of methods developed in other domains. We define what is the task of hierarchical classification and discuss why some related tasks should not be considered hierarchical classification. We also present a new perspective about some existing hierarchical classification approaches, and based on that perspective we propose a new unifying framework to classify the existing approaches. We also present a review of empirical comparisons of the existing methods reported in the literature as well as a conceptual comparison of those methods at a high level of abstraction, discussing their advantages and disadvantages.

[1]  Matthias W. Seeger,et al.  Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods , 2008, J. Mach. Learn. Res..

[2]  Claudio Gentile,et al.  Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[3]  Joydeep Ghosh,et al.  Integrating support vector machines in a hierarchical output space decomposition framework , 2004, IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium.

[4]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[5]  Giorgio Valentini,et al.  True Path Rule Hierarchical Ensembles , 2009, MCS.

[6]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[7]  Susan Gauch,et al.  Training a hierarchical classifier using inter document relationships , 2009, J. Assoc. Inf. Sci. Technol..

[8]  Juho Rousu,et al.  Towards structured output prediction of enzyme function , 2008, BMC proceedings.

[9]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[10]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[11]  Stan Matwin,et al.  Learning and Evaluation in the Presence of Class Hierarchies: Application to Text Categorization , 2006, Canadian AI.

[12]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[13]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Comparing Techniques for Multiclass Classification Using Binary SVM Predictors , 2004, MICAI.

[14]  Tao Li,et al.  Music genre classification with taxonomy , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  Michelangelo Ceci,et al.  Hierarchical Text Categorization in a Transductive Setting , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[17]  Saso Dzeroski,et al.  Hierarchical annotation of medical images , 2011, Pattern Recognit..

[18]  Dunja Mladenic,et al.  Feature selection on hierarchy of web documents , 2003, Decis. Support Syst..

[19]  Aaron Kershenbaum,et al.  The Effect of Using Hierarchical Classifiers in Text Categorization , 2000, RIAO.

[20]  Sylvie Marcos,et al.  High-Resolution Source Localization Algorithm Based on the Conjugate Gradient , 2007, EURASIP J. Adv. Signal Process..

[21]  Alex Alves Freitas,et al.  Hierarchical classification of protein function with ensembles of rules and particle swarm optimisation , 2008, Soft Comput..

[22]  Franco Turini,et al.  Knowledge discovery from spatial transactions , 2007, Journal of Intelligent Information Systems.

[23]  Saso Dzeroski,et al.  Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics , 2006, PKDD.

[24]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[25]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark (Figures 1 and 2) , 2004, Shape Modeling International Conference.

[26]  Yiming Yang,et al.  Support vector machines classification with a very large-scale taxonomy , 2005, SKDD.

[27]  Fabrizio Sebastiani,et al.  On the Selection of Negative Examples for Hierarchical Text Categorization , 2007 .

[28]  Jayme G. A. Barbedo,et al.  Automatic Genre Classification of Musical Signals , 2007, EURASIP J. Adv. Signal Process..

[29]  L. L. Lloyd,et al.  Enzyme nomenclature — Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology: Academic Press Ltd, London, UK, 1992. xiii + 862 pp. Price £40.00. ISBN 0-12-227165-3 , 1994 .

[30]  Matthew N. Davies,et al.  An experimental comparison of classification algorithms for hierarchical prediction of protein function , 2007 .

[31]  Tipton Kf,et al.  Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). Enzyme nomenclature. Recommendations 1992. Supplement: corrections and additions. , 1994 .

[32]  Alex Alves Freitas,et al.  A hybrid particle swarm/ant colony algorithm for the classification of hierarchical biological data , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..

[33]  Alex Alves Freitas,et al.  A Hierarchical Classification Ant Colony Algorithm for Predicting Gene Ontology Terms , 2009, EvoBIO.

[34]  Tong Zhang,et al.  Semi-automatic approach for music classification , 2003, SPIE ITCom.

[35]  Patricia C Babbitt,et al.  Can sequence determine function? , 2000, Genome Biology.

[36]  Zhongzhe Xiao,et al.  Hierarchical Classification of Emotional Speech , 2007 .

[37]  Thomas Hofmann,et al.  Exploiting Known Taxonomies in Learning Overlapping Concepts , 2007, IJCAI.

[38]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[39]  K. Tipton,et al.  Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). Enzyme nomenclature. Recommendations 1992. Supplement: corrections and additions. , 1994, European journal of biochemistry.

[40]  Alessandro Lameiras Koerich,et al.  Unconstrained handwritten character recognition using metaclasses of characters , 2005, IEEE International Conference on Image Processing 2005.

[41]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[42]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[43]  Xiaotong Shen,et al.  On Large Margin Hierarchical Classification With Multiple Paths , 2009, Journal of the American Statistical Association.

[44]  G. Valentini,et al.  Weighted True Path Rule: a multilabel hierarchical algorithm for gene function prediction , 2009 .

[45]  Jung-Hsien Chiang,et al.  Hierarchically SVM classification based on support vector clustering method and its application to document categorization , 2007, Expert Syst. Appl..

[46]  Alex Alves Freitas,et al.  Improving the performance of hierarchical classification with swarm intelligence , 2008, EVOBIO 2008.

[47]  Stan Matwin,et al.  Functional Annotation of Genes Using Hierarchical Text Categorization , 2005 .

[48]  Yoram Singer,et al.  An Online Algorithm for Hierarchical Phoneme Classification , 2004, MLMI.

[49]  Hércules Antonio do Prado,et al.  Emerging Technologies of Text Mining: Techniques and Applications , 2007 .

[50]  Ke Wang,et al.  Hierarchical Classification of Real Life Documents , 2001, SDM.

[51]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[52]  Alex Alves Freitas,et al.  A Global-Model Naive Bayes Approach to the Hierarchical Prediction of Protein Functions , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[53]  Ke Wang,et al.  Building Hierarchical Classifiers Using Class Proximity , 1999, VLDB.

[54]  Alex A. Freitas,et al.  HIERARCHICAL CLASSIFICATION OF G-PROTEIN-COUPLED RECEPTORS WITH A PSO/ACO ALGORITHM , 2006 .

[55]  Juho Rousu,et al.  Kernel-Based Learning of Hierarchical Multilabel Classification Models , 2006, J. Mach. Learn. Res..

[56]  Qiang Yang,et al.  Deep classification in large-scale text hierarchies , 2008, SIGIR '08.

[57]  J. Jośe A HIERARCHICAL APPROACH TO AUTOMATIC MUSICAL GENRE CLASSIFICATION , 2003 .

[58]  Ee-Peng Lim,et al.  Performance measurement framework for hierarchical text classification , 2003, J. Assoc. Inf. Sci. Technol..

[59]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[60]  Christopher DeCoro,et al.  Bayesian Aggregation for Hierarchical Genre Classification , 2007, ISMIR.

[61]  Andreas S. Weigend,et al.  Exploiting Hierarchy in Text Categorization , 1999, Information Retrieval.

[62]  Alex A. Freitas,et al.  A Tutorial on Hierarchical Classification with Applications in Bioinformatics. , 2007 .

[63]  Nomenclature committee of the international union of biochemistry and molecular biology (NC-IUBMB), Enzyme Supplement 5 (1999). , 1999, European journal of biochemistry.

[64]  Timothy W. Finin,et al.  Yahoo! as an ontology: using Yahoo! categories to describe documents , 1999, CIKM '99.

[65]  Claudio Gentile,et al.  Hierarchical classification: combining Bayes with SVM , 2006, ICML.

[66]  Alex Alves Freitas,et al.  Multi-label Hierarchical Classification of Protein Functions with Artificial Immune Systems , 2008, BSB.

[67]  Ichiro Fujinaga,et al.  Automatic Genre Classification Using Large High-Level Musical Feature Sets , 2004, ISMIR.

[68]  Michelangelo Ceci,et al.  Classifying web documents in a hierarchy of categories: a comprehensive study , 2007, Journal of Intelligent Information Systems.

[69]  Prabhakar Raghavan,et al.  Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies , 1998, The VLDB Journal.

[70]  Nicolò Cesa-Bianchi,et al.  Hierarchical Cost-Sensitive Algorithms for Genome-Wide Gene Function Prediction , 2009, MLSB.

[71]  Hans-Peter Kriegel,et al.  Hierarchical Genre Classification for Large Music Collections , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[72]  Joydeep Ghosh,et al.  Hierarchical Fusion of Multiple Classifiers for Hyperspectral Data Analysis , 2002, Pattern Analysis & Applications.

[73]  Domonkos Tikk,et al.  A hierarchical text categorization approach and its application to FRT expansion 1 , 2003 .

[74]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[75]  Alexander Lerch,et al.  A HIERARCHICAL APPROACH TO AUTOMATIC MUSICAL GENRE CLASSIFICATION , 2003 .

[76]  Maurice Bruynooghe,et al.  Hierarchical multi-classification , 2002, KDD 2002.

[77]  Joydeep Ghosh,et al.  Automatically learning document taxonomies for hierarchical classification , 2005, WWW '05.

[78]  ChengXiang Zhai,et al.  Multi-label literature classification based on the Gene Ontology graph , 2008, BMC Bioinformatics.

[79]  Duane Szafron,et al.  Improving Protein Function Prediction using the Hierarchical Structure of the Gene Ontology , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[80]  Xiaogang Peng,et al.  Document Classifications based on Word Semantic Hierarchies , 2005, Artificial Intelligence and Applications.

[81]  Domonkos Tikk,et al.  A Hierarchical Online Classifier for Patent Categorization , 2007 .

[82]  Alex Alves Freitas,et al.  Top-Down Hierarchical Ensembles of Classifiers for Predicting G-Protein-Coupled-Receptor Functions , 2008, BSB.

[83]  HaoPei-Yi,et al.  Hierarchically SVM classification based on support vector clustering method and its application to document categorization , 2007 .

[84]  Motoaki Kawanabe,et al.  Efficient Classification of Images with Taxonomies , 2009, ACCV.

[85]  Jaideep Srivastava,et al.  Blocking reduction strategies in hierarchical text classification , 2004, IEEE Transactions on Knowledge and Data Engineering.

[86]  Ee-Peng Lim,et al.  Hierarchical text classification and evaluation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[87]  Hans-Peter Kriegel,et al.  Using Support Vector Machines for Classifying Large Sets of Multi-Represented Objects , 2004, SDM.

[88]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[89]  Paul N. Bennett,et al.  Refined experts: improving classification in large taxonomies , 2009, SIGIR.

[90]  J. Stephen Downie,et al.  Toward a Theory of Music Information Retrieval Queries: System Design Implications , 2002, ISMIR.

[91]  Minoru Sasaki,et al.  Rule-based text categorization using hierarchical categories , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[92]  Andrea Esuli,et al.  Boosting multi-label hierarchical text categorization , 2008, Information Retrieval.

[93]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[94]  Joydeep Ghosh,et al.  Enhanced hierarchical classification via isotonic smoothing , 2008, WWW.

[95]  Thomas Hofmann,et al.  Large margin methods for label sequence learning , 2003, INTERSPEECH.

[96]  Amanda Clare,et al.  Predicting gene function in Saccharomyces cerevisiae , 2003, ECCB.

[97]  A. Barrett,et al.  Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). Enzyme Nomenclature. Recommendations 1992. Supplement 4: corrections and additions (1997). , 1997, European journal of biochemistry.

[98]  Juho Rousu,et al.  Learning hierarchical multi-category text classification models , 2005, ICML.

[99]  D. Tikk,et al.  Experiment with a hierarchical text categorization method on the WIPO-alpha patent collection , 2003, Fourth International Symposium on Uncertainty Modeling and Analysis, 2003. ISUMA 2003..

[100]  Vasant Honavar,et al.  Learning Classifiers Using Hierarchically Structured Class Taxonomies , 2005, SARA.

[101]  J. Stephen Downie,et al.  Survey Of Music Information Needs, Uses, And Seeking Behaviours: Preliminary Findings , 2004, ISMIR.

[102]  Xuanjing Huang,et al.  Hierarchical Multi-Label Text Categorization with Global Margin Maximization , 2009, ACL.

[103]  Hans-Peter Kriegel,et al.  MUSCLE: Music Classification Engine with User Feedback , 2006, EDBT.

[104]  Alex A. Freitas,et al.  A review of performance evaluation measures for hierarchical classifiers , 2007 .

[105]  Luiz Eduardo Soares de Oliveira,et al.  Metaclasses and Zoning Mechanism Applied to Handwriting Recognition , 2008, J. Univers. Comput. Sci..

[106]  Alex Alves Freitas,et al.  Novel top-down approaches for hierarchical classification and their application to automatic music genre classification , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[107]  Jae Dong Yang,et al.  Hierarchical text categorization using fuzzy relational thesaurus , 2003, Kybernetika.

[108]  Alex Alves Freitas,et al.  Comparing Several Approaches for Hierarchical Classification of Proteins with Decision Trees , 2007, BSB.

[109]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[110]  O. Troyanskaya,et al.  Predicting gene function in a hierarchical context with an ensemble of classifiers , 2008, Genome Biology.

[111]  Padmini Srinivasan,et al.  Hierarchical Text Categorization Using Neural Networks , 2004, Information Retrieval.

[112]  Christopher DeCoro,et al.  Hierarchical Shape Classification Using Bayesian Aggregation , 2006, IEEE International Conference on Shape Modeling and Applications 2006 (SMI'06).

[113]  Boris Hayete,et al.  GOTrees: Predicting GO Associations from Protein Domain Composition Using Decision Trees , 2004, Pacific Symposium on Biocomputing.

[114]  Alex Alves Freitas,et al.  Hierarchical classification of G-Protein-Coupled Receptors with data-driven selection of attributes and classifiers , 2009 .