Hierarchical Multi-Label Classification: Going Beyond Generalization Trees

of a dissertation at the University of Miami. Dissertation supervised by Professor Miroslav Kubat. No. of pages in text. (183) Traditional computational approach to automated classification assumes that each object should be assigned to only one out of two or more classes. However, some realworld applications digress from this generic scenario in two important ways. First, each example can belong to several classes simultaneously (multi-label classification). Second, the classes can be hierarchically ordered in the sense that some are more specific versions of others (hierarchical classification). Seeking to address both of these issues, the presented work deals with “hierarchical multi-label classification.” The task has recently received considerable attention; databases in various fields, including web repositories, digital libraries, or genomics, are known to be organized as hierarchies. Seeking to start with something relatively simple, scientists have focused on the special case where the inter-class relations are captured by tree-structured hierarchy. This, however, is not enough. Very often, some classes have more than one parent, in which case the mutual relations (if they are known) have to be described by a hierarchy structured as a directed acyclic graph (DAG). This dissertation intends to contribute to this more general problem. Literature survey indicates that, in non-hierarchical multi-label classification, good performance is achieved when a Support Vector Machine (SVM) is used to induce each class separately. This said, some experiments suggest that further improvement can be achieved by explicitly dealing with the problem of imbalanced training sets because, in most classes, negative examples heavily outnumber positive ones. The author proposes a solution in terms of a technique referred to as R-SVM; the idea is to re-adjust the SVM-hyperplane offset accordingly. Experiments in the first part of this dissertation rely on data from domains of text-categorization. More important, however, is then the second part that focuses on hierarchical multi-label classification. Here, the author proposes a new technique, HR-SVM, that essentially constitutes a hierarchical extension of R-SVM proceeding in a top-down fashion from more general to more specific classifiers. The weakness of this approach is known as “error propagation”: examples misclassified at higher levels are propagated down the hierarchy, thus resulting in negative performances at the lower levels. HR-SVM contains a mechanism to correct this kind of errors. The system has been subjected to extensive experiments with many domains from the field of gene function prediction. The results show that the new technique compares favorably with other existing approaches along various performance criteria. To my beloved parents

[1]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[2]  Fabrizio Sebastiani,et al.  Selecting negative examples for hierarchical text classification: An experimental comparison , 2010, J. Assoc. Inf. Sci. Technol..

[3]  Ronald K. Pearson,et al.  The problem of disguised missing data , 2006, SKDD.

[4]  Matthew N. Davies,et al.  An experimental comparison of classification algorithms for hierarchical prediction of protein function , 2007 .

[5]  A. Bryman,et al.  Handbook of data analysis , 2004 .

[6]  Francisco Azuaje,et al.  A missing data estimation analysis in type II diabetes databases , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[7]  Fabrizio Sebastiani,et al.  On the Selection of Negative Examples for Hierarchical Text Categorization , 2007 .

[8]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[9]  Stan Matwin,et al.  Functional Annotation of Genes Using Hierarchical Text Categorization , 2005 .

[10]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[11]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[12]  Andreas Hotho,et al.  Automatic Multi-label Subject Indexing in a Multilingual Environment , 2003, ECDL.

[14]  Ben Goertzel,et al.  Accurate SVM Text Classification for Highly Skewed Data Using Threshold Tuning and Query-Expansion-Based Feature Selection , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[15]  P. Allison Multiple Imputation for Missing Data , 2000 .

[16]  Pavel Brazdil,et al.  Comparison of SVM and Some Older Classification Algorithms in Text Classification Tasks , 2006, IFIP AI.

[17]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[18]  Sati Mazumdar,et al.  Elegant decision tree algorithm for classification in data mining , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops), 2002..

[19]  Miroslav Kubat,et al.  Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study , 2007, IEEE Transactions on Knowledge and Data Engineering.

[20]  B. Chandra,et al.  Fuzzy SLIQ Decision Tree Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[22]  Peerapon Vateekul,et al.  A conflict-based confidence measure for associative classification , 2008, 2008 IEEE International Conference on Information Reuse and Integration.

[23]  Marko Grobelnik,et al.  Training text classifiers with SVM on very few positive examples , 2003 .

[24]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[25]  Gladys McPherson,et al.  Simple imputation methods were inadequate for missing not at random (MNAR) quality of life data , 2008, Health and quality of life outcomes.

[26]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[27]  Peter I. Cowling,et al.  MMAC: a new multi-class, multi-label associative classification approach , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[28]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[29]  Ke Wang,et al.  Growing decision trees on support-less association rules , 2000, KDD '00.

[30]  D. Altman,et al.  Missing data , 2007, BMJ : British Medical Journal.

[31]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[32]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[33]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[34]  Dunja Mladenic,et al.  Machine Learning on non-homogeneous, distributed text data , 1998 .

[35]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[36]  Chin-Hui Lee,et al.  A MFoM learning approach to robust multiclass multi-label text categorization , 2004, ICML.

[37]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[38]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[39]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[40]  Amanda Clare,et al.  Functional bioinformatics for Arabidopsis thaliana , 2006, Bioinform..

[41]  James G. Shanahan,et al.  Boosting support vector machines for text classification through parameter-free threshold relaxation , 2003, CIKM '03.

[42]  Saso Dzeroski,et al.  Predicting gene function using hierarchical multi-label decision tree ensembles , 2010, BMC Bioinformatics.

[43]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[44]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[45]  C. A. Andersen,et al.  Prediction of human protein function from post-translational modifications and localization features. , 2002, Journal of molecular biology.

[46]  Peter Jansen,et al.  Threshold Calibration in CLARIT Adaptive Filtering , 1998, TREC.

[47]  Peerapon Vateekul,et al.  Irrelevant attributes and imbalanced classes in multi-label text-categorization domains , 2011, Intell. Data Anal..

[48]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[49]  Joarder Kamruzzaman,et al.  z-SVM: An SVM for Improved Classification of Imbalanced Data , 2006, Australian Conference on Artificial Intelligence.

[50]  Ee-Peng Lim,et al.  Hierarchical text classification and evaluation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[51]  Roberto Basili,et al.  Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms by Thorsten Joachims , 2003, Comput. Linguistics.

[52]  Alex A. Freitas,et al.  A review of performance evaluation measures for hierarchical classifiers , 2007 .

[53]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[54]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[55]  Saso Dzeroski,et al.  Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics , 2006, PKDD.

[56]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[57]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[58]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[59]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[60]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[61]  Peerapon Vateekul,et al.  Fast Induction of Multiple Decision Trees in Text Categorization from Large Scale, Imbalanced, and Multi-label Data , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[62]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[63]  Ee-Peng Lim,et al.  On strategies for imbalanced text classification using SVM: A comparative study , 2009, Decis. Support Syst..

[64]  Rong Jin,et al.  Correlated Label Propagation with Application to Multi-label Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[65]  Peerapon Vateekul,et al.  Tree-Based Approach to Missing Data Imputation , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[66]  Jinglu Hu,et al.  Gene classification using an improved SVM classifier with soft decision boundary , 2008, 2008 SICE Annual Conference.

[67]  R. King,et al.  On the optimization of classes for the assignment of unidentified reading frames in functional genomics programmes: the need for machine learning. , 2000, Trends in biotechnology.

[68]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[69]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[70]  Amanda Clare,et al.  Predicting gene function in Saccharomyces cerevisiae , 2003, ECCB.

[71]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[72]  M. Riley,et al.  Functions of the gene products of Escherichia coli , 1993, Microbiological reviews.

[73]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[74]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[75]  James L. Arbuckle,et al.  Full Information Estimation in the Presence of Incomplete Data , 1996 .

[76]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[77]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[78]  Grigorios Tsoumakas,et al.  Protein Classification with Multiple Algorithms , 2005, Panhellenic Conference on Informatics.

[79]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[80]  Peerapon Vateekul,et al.  Hierarchical multi-label classification with SVMs: A case study in gene function prediction , 2014, Intell. Data Anal..

[81]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[82]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[83]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[84]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[85]  William E. Winkler,et al.  Methods for evaluating and creating data quality , 2004, Inf. Syst..

[86]  James T. Kwok,et al.  Automated Text Categorization Using Support Vector Machine , 1998, ICONIP.

[87]  Michael R Chernick,et al.  Bootstrap Methods: A Guide for Practitioners and Researchers , 2007 .

[88]  Lincoln Stein,et al.  Genome annotation: from sequence to biology , 2001, Nature Reviews Genetics.

[89]  Shinichi Nakagawa,et al.  Missing inaction: the dangers of ignoring missing data. , 2008, Trends in ecology & evolution.

[90]  Hans-Werner Mewes,et al.  MIPS: a database for protein sequences, homology data and yeast genome information , 1997, Nucleic Acids Res..

[91]  Heitor Silvério Lopes,et al.  Neural networks for protein classification , 2004, Applied bioinformatics.

[92]  Fabrizio Sebastiani Classification of Text, Automatic , 2006 .

[93]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[94]  Carolyn L Murdaugh,et al.  Handbook of Data Analysis , 2006 .

[95]  Duane Szafron,et al.  Improving Protein Function Prediction using the Hierarchical Structure of the Gene Ontology , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[96]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[97]  Björn Stenger,et al.  Estimating 3D hand pose using hierarchical multi-label classification , 2007, Image Vis. Comput..

[98]  Susan J. Slaughter,et al.  The Little SAS Book: A Primer , 1995 .

[99]  Peerapon Vateekul,et al.  Improving SVM Performance in Multi-Label Domains: Threshold Adjustment , 2013, Int. J. Artif. Intell. Tools.

[100]  Yiming Yang,et al.  A scalability analysis of classifiers in text categorization , 2003, SIGIR.

[101]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[102]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[103]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[104]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[105]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[106]  Li Yan,et al.  A New Method of Support Vector Machine for Class Imbalance Problem , 2009, 2009 International Joint Conference on Computational Sciences and Optimization.

[107]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[108]  Bhekisipho Twala,et al.  AN EMPIRICAL COMPARISON OF TECHNIQUES FOR HANDLING INCOMPLETE DATA USING DECISION TREES , 2009, Appl. Artif. Intell..

[109]  R. Little A Test of Missing Completely at Random for Multivariate Data with Missing Values , 1988 .