A Novel Stacking Method for Multi-label Classification

of a dissertation at the University of Miami. Dissertation supervised by Professor Miroslav Kubat. No. of pages in text. (128) Classical machine learning algorithms were tailored to automatically classify examples that belong to mutually exclusive classes; each example may belong to one class out of a finite set of classes. In realistic applications, however, examples often belong to more than one class at the same time. For example, a text document that belongs to Geography may also be labeled as Geology. Perhaps due to the popularity of its applications, targeting this category of problems has garnered great research interest over the past decade. A widely popular approach, called Binary Relevance (BR), is to induce a separate classifier for each class; to determine whether the class is relevant for an example, or not. Despite showing some success, researchers have pointed out a critical drawback in this method. By targeting each class independently, the learner does not model class correlations: knowing if an example belongs to class X may indicate that it is likely to belong also to class Y. Conversely, this information can make the example less likely to belong to class Z. Research groups sought to incorporate class correlation information into BR by using the class labels as additional example features. Since the information about which class an example belongs to is unknown in unseen instances, the missing values are typically filled-in using the outputs of other classifiers, which makes them prone to errors. This dissertation identifies two weaknesses in existing methods: unnecessary label correlations, and error-propagation. To overcome these problems, this dissertation introduces a new multi-label classification method, called PruDent. Experiments over a broad range of benchmark datasets indicate that PruDent compares rather favorably with existing state-of-the-art methods. Additionally, PruDent improves classification accuracy while maintaining a linear complexity in the number of classes.

[1]  Zhi-Hua Zhou,et al.  Multi-Label Learning by Exploiting Label Correlations Locally , 2012, AAAI.

[2]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.

[3]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  A.N. Srivastava,et al.  Discovering recurring anomalies in text reports regarding complex space systems , 2005, 2005 IEEE Aerospace Conference.

[6]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[7]  Yang Yu,et al.  Multi-label hypothesis reuse , 2012, KDD.

[8]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[9]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[10]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[11]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[12]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[13]  Dejan Gjorgjevikj,et al.  Efficient Two Stage Voting Architecture for Pairwise Multi-label Classification , 2010, Australasian Conference on Artificial Intelligence.

[14]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[15]  Miroslav Kubat,et al.  INDUCTION FROM MULTI-LABEL EXAMPLES IN INFORMATION RETRIEVAL SYSTEMS: A CASE STUDY , 2008, Appl. Artif. Intell..

[16]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[17]  Grigorios Tsoumakas,et al.  Multi-label classification of music by emotion , 2011, EURASIP J. Audio Speech Music. Process..

[18]  Eyke Hüllermeier,et al.  An Analysis of Chaining in Multi-Label Classification , 2012, ECAI.

[19]  Yang Liu,et al.  Enhancing Multimedia Semantic Concept Mining and Retrieval by Incorporating Negative Correlations , 2014, 2014 IEEE International Conference on Semantic Computing.

[20]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[21]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[22]  M. Craven,et al.  Pairwise learning of multilabel classifications with perceptrons , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[23]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[24]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[25]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[27]  Tim Niblett,et al.  Constructing Decision Trees in Noisy Domains , 1987, EWSL.

[28]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[29]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[30]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[31]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[32]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[33]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[34]  Miroslav Kubat,et al.  Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study , 2007, IEEE Transactions on Knowledge and Data Engineering.

[35]  Yihong Gong,et al.  Multi-labelled classification using maximum entropy method , 2005, SIGIR '05.

[36]  Francisco Herrera,et al.  A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.

[37]  Eyke Hüllermeier,et al.  Rectifying Classifier Chains for Multi-Label Classification , 2019, LWA.

[38]  Peerapon Vateekul,et al.  Fast Induction of Multiple Decision Trees in Text Categorization from Large Scale, Imbalanced, and Multi-label Data , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[39]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[40]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[41]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[42]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[43]  Grigorios Tsoumakas,et al.  Protein Classification with Multiple Algorithms , 2005, Panhellenic Conference on Informatics.

[44]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[45]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[46]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[47]  Eyke Hüllermeier,et al.  On the Problem of Error Propagation in Classifier Chains for Multi-label Classification , 2012, GfKl.

[48]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[49]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[50]  Jason Weston,et al.  Kernel methods for Multi-labelled classification and Categ orical regression problems , 2001, NIPS 2001.

[51]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[52]  Peerapon Vateekul,et al.  Irrelevant attributes and imbalanced classes in multi-label text-categorization domains , 2011, Intell. Data Anal..

[53]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[54]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[55]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[56]  Grigorios Tsoumakas,et al.  A Comprehensive Study Over VLAD and Product Quantization in Large-Scale Image Retrieval , 2014, IEEE Transactions on Multimedia.

[57]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[58]  Everton Alvares Cherman,et al.  Incorporating label dependency into the binary relevance framework for multi-label classification , 2012, Expert Syst. Appl..

[59]  Chin-Hui Lee,et al.  A MFoM learning approach to robust multiclass multi-label text categorization , 2004, ICML.

[60]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[61]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[62]  A. Hall,et al.  Adaptive Switching Circuits , 2016 .

[63]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[64]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[65]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[66]  Rémi Gilleron,et al.  Learning Multi-label Alternating Decision Trees from Texts and Data , 2003, MLDM.

[67]  Lior Rokach,et al.  Exploiting label dependencies for improved sample complexity , 2013, Machine Learning.

[68]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[69]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[70]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[71]  Zhi-Hua Zhou,et al.  Selective Ensemble of Classifier Chains , 2013, MCS.

[72]  Alex Alves Freitas,et al.  Distinct Chains for Different Instances: An Effective Strategy for Multi-label Classifier Chains , 2014, ECML/PKDD.

[73]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[74]  Xindong Wu,et al.  Bridging Local and Global Data Cleansing: Identifying Class Noise in Large, Distributed Data Datasets , 2006, Data Mining and Knowledge Discovery.

[75]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[76]  Peerapon Vateekul,et al.  Improving SVM Performance in Multi-Label Domains: Threshold Adjustment , 2013, Int. J. Artif. Intell. Tools.

[77]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[78]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[79]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[80]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[81]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[82]  Grigorios Tsoumakas,et al.  Multilabel Text Classification for Automated Tag Suggestion , 2008 .

[83]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[84]  Grigorios Tsoumakas,et al.  Correlation-Based Pruning of Stacked Binary Relevance Models for Multi-Label Learning , 2009 .

[85]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[86]  Ian H. Witten,et al.  Data Mining: Practical Machine Learning Tools and Techniques, 3/E , 2014 .

[87]  Eyke Hüllermeier,et al.  Dependent binary relevance models for multi-label classification , 2014, Pattern Recognit..

[88]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[89]  Miroslav Kubat,et al.  PruDent: A Pruned and Confident Stacking Approach for Multi-Label Classification , 2015, IEEE Transactions on Knowledge and Data Engineering.

[90]  Nils J. Nilsson,et al.  Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[91]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[92]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[93]  Dejan Gjorgjevikj,et al.  Two Stage Classifier Chain Architecture for efficient pair-wise multi-label learning , 2011, 2011 IEEE International Workshop on Machine Learning for Signal Processing.