Multilabel Classification

This book is focused on multilabel classification and related topics. Multilabel classification is one specific type of classification, classification being one of the usual tasks in the data mining field. Data mining itself can be seen as a step into a broad process, the discovery of new knowledge from databases. The goal of this first chapter is to introduce all these concepts, aiming to set the working context for the topics covered in the following ones. A global outline to this respect is given in Sect. 1.1. Section1.2 provides an overview of the whole Knowledge Discovery in Databases process. Section1.3 introduces the essential preprocessing tasks. Then, the different learning styles in use nowadays are explained in Sect. 1.4, and lastly multilabel classification is introduced in comparison with other traditional types of classification in Sect. 1.5.

[1]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[2]  Concha Bielza,et al.  Multi-label classification with Bayesian network-based chain classifiers , 2014, Pattern Recognit. Lett..

[3]  Xindong Wu,et al.  Compressed labeling on distilled labelsets for multi-label learning , 2012, Machine Learning.

[4]  José Ramón Quevedo,et al.  Multilabel classifiers with a probabilistic thresholding strategy , 2012, Pattern Recognit..

[5]  Thomas Stützle,et al.  Ant Colony Optimization Theory , 2004 .

[6]  Grigorios Tsoumakas,et al.  Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[7]  Mohammed J. Zaki,et al.  Multi-label Lazy Associative Classification , 2007, PKDD.

[8]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[9]  Min-Ling Zhang,et al.  Ml-rbf: RBF Neural Networks for Multi-Label Learning , 2009, Neural Processing Letters.

[10]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[11]  Alex Alves Freitas,et al.  A Genetic Algorithm for Optimizing the Label Ordering in Multi-label Classifier Chains , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[12]  Grigorios Tsoumakas,et al.  An Empirical Study of Lazy Multilabel Classification Algorithms , 2008, SETN.

[13]  Eyke Hüllermeier,et al.  On label dependence in multilabel classification , 2010, ICML 2010.

[14]  Zhi-Hua Zhou,et al.  Multilabel dimensionality reduction via dependence maximization , 2008, TKDD.

[15]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[16]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[17]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[18]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[19]  Newton Spolaôr,et al.  A Framework to Generate Synthetic Multi-label Datasets , 2014, CLEI Selected Papers.

[20]  Francisco Charte,et al.  R Ultimate Multilabel Dataset Repository , 2016, HAIS.

[21]  Cândida Ferreira Gene Expression Programming in Problem Solving , 2002 .

[22]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[24]  Dae-Won Kim,et al.  Mutual Information-based multi-label feature selection using interaction information , 2015, Expert Syst. Appl..

[25]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[26]  Francisco Charte,et al.  Working with Multilabel Datasets in R: The mldr Package , 2015, R J..

[27]  Francisco Charte,et al.  Resampling Multilabel Datasets by Decoupling Highly Imbalanced Labels , 2015, HAIS.

[28]  Tommy W. S. Chow,et al.  ML-TREE: A Tree-Structure-Based Approach to Multilabel Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Jieping Ye,et al.  Hypergraph spectral learning for multi-label classification , 2008, KDD.

[30]  R. Fisher THE STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS , 1938 .

[31]  Alex Alves Freitas,et al.  Data mining with an ant colony optimization algorithm , 2002, IEEE Trans. Evol. Comput..

[32]  Eyke Hüllermeier,et al.  Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[33]  Josef Kittler,et al.  Multilabel classification using heterogeneous ensemble of multi-label classifiers , 2012, Pattern Recognit. Lett..

[34]  Yanika Kongsorot,et al.  Multi-label classification with extreme learning machine , 2014, 2014 6th International Conference on Knowledge and Smart Technology (KST).

[35]  Mahesh Panchal,et al.  Review on Various Problem Transformation Methods for Classifying Multi-Label Data , 2014 .

[36]  A. K. Jain,et al.  A critical evaluation of intrinsic dimensionality algorithms. , 1980 .

[37]  Bernhard Schölkopf,et al.  Kernel Dependency Estimation , 2002, NIPS.

[38]  Francisco Charte,et al.  MLeNN: A First Approach to Heuristic Multilabel Undersampling , 2014, IDEAL.

[39]  Xuesong Yan,et al.  Multi-label Classification based on Particle Swarm Algorithm , 2013, 2013 IEEE 9th International Conference on Mobile Ad-hoc and Sensor Networks.

[40]  Gustavo E. A. P. A. Batista,et al.  Class imbalance revisited: a new experimental setup to assess the performance of treatment methods , 2014, Knowledge and Information Systems.

[41]  Francisco Charte,et al.  MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation , 2015, Knowl. Based Syst..

[42]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[43]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[44]  Saso Dzeroski,et al.  Ensembles of Multi-Objective Decision Trees , 2007, ECML.

[45]  Jesse Read,et al.  A Pruned Problem Transformation Method for Multi-label Classification , 2008 .

[46]  Francisco Herrera,et al.  An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes , 2011, Pattern Recognit..

[47]  Masami Ito,et al.  Task decomposition and module combination based on class relations: a modular neural network for pattern classification , 1999, IEEE Trans. Neural Networks.

[48]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[49]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.

[50]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[51]  Juan Ramón Rico-Juan,et al.  Improving kNN multi-label classification in Prototype Selection scenarios using class proposals , 2015, Pattern Recognit..

[52]  Saso Dzeroski,et al.  Dual Layer Voting Method for Efficient Multi-label Classification , 2011, IbPRIA.

[53]  Newton Spolaôr,et al.  A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach , 2013, CLEI Selected Papers.

[54]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[55]  Xin Li,et al.  Conditional Restricted Boltzmann Machines for Multi-label Learning with Incomplete Labels , 2015, AISTATS.

[56]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[57]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[58]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[59]  Miroslav Kubat,et al.  Undersampling Approach for Imbalanced Training Sets and Induction from Multi-label Text-Categorization Domains , 2009, PAKDD Workshops.

[60]  Grigorios Tsoumakas,et al.  Correlation-Based Pruning of Stacked Binary Relevance Models for Multi-Label Learning , 2009 .

[61]  Liang Sun,et al.  Multi-Label Dimensionality Reduction , 2013 .

[62]  Isabelle Guyon,et al.  Multivariate Non-Linear Feature Selection with Kernel Multiplicative Updates and Gram-Schmidt Relief , 2003 .

[63]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[64]  Jessica A. Turner,et al.  Automated annotation of functional imaging experiments via multi-label classification , 2013, Front. Neurosci..

[65]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[66]  Luca Martino,et al.  Scalable multi-output label prediction: From classifier chains to classifier trellises , 2015, Pattern Recognit..

[67]  John Langford,et al.  Multi-Label Prediction via Compressed Sensing , 2009, NIPS.

[68]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[69]  Volker Tresp,et al.  Multi-label informed latent semantic indexing , 2005, SIGIR '05.

[70]  Josef Kittler,et al.  Inverse random under sampling for class imbalance problem and its application to multi-label classification , 2012, Pattern Recognit..

[71]  Francisco Charte,et al.  Concurrence among Imbalanced Labels and Its Influence on Multilabel Resampling Algorithms , 2014, HAIS.

[72]  Peter A. Flach,et al.  LaCova: A Tree-Based Multi-label Classifier Using Label Covariance as Splitting Criterion , 2014, 2014 13th International Conference on Machine Learning and Applications.

[73]  A. J. Rivera,et al.  A First Approach to Deal with Imbalance in Multi-label Datasets , 2013, HAIS.

[74]  Sebastián Ventura,et al.  Multi-label Classification with Gene Expression Programming , 2009, HAIS.

[75]  Jianhua Xu,et al.  Fast multi-label core vector machine , 2013, Pattern Recognit..

[76]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[77]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[78]  Eyke Hüllermeier,et al.  Combining Instance-Based Learning and Logistic Regression for Multilabel Classification , 2009, ECML/PKDD.

[79]  Francisco Charte,et al.  Addressing imbalance in multilabel classification: Measures and random resampling algorithms , 2015, Neurocomputing.

[80]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[81]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[82]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[83]  Alex Alves Freitas,et al.  A new ant colony algorithm for multi-label classification with applications in bioinfomatics , 2006, GECCO.

[84]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[85]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[86]  Mikel Galar,et al.  Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches , 2013, Knowl. Based Syst..

[87]  Francisco Charte,et al.  LI-MLC: A Label Inference Methodology for Addressing High Dimensionality in the Label Space for Multilabel Classification , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[88]  Xia Geng An Improved Multi-label Classification Algorithm BRkNN , 2014 .

[89]  Cunhe Li,et al.  Improvement of Learning Algorithm for the Multi-instance Multi-label RBF Neural Networks Trained with Imbalanced Samples , 2013, J. Inf. Sci. Eng..

[90]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[91]  Piotr Synak,et al.  Multi-Label Classification of Emotions in Music , 2006, Intelligent Information Systems.

[92]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[93]  Everton Alvares Cherman,et al.  Incorporating label dependency into the binary relevance framework for multi-label classification , 2012, Expert Syst. Appl..

[94]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[95]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[96]  Michel Verleysen,et al.  Mutual information-based feature selection for multilabel classification , 2013, Neurocomputing.

[97]  Germán Castellanos-Domínguez,et al.  Managing Imbalanced Data Sets in Multi-label Problems: A Case Study with the SMOTE Algorithm , 2013, CIARP.

[98]  Ken Chen,et al.  Efficient Classification of Multi-label and Imbalanced Data using Min-Max Modular Classifiers , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[99]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[100]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[101]  Bo Chen,et al.  Simplified Constraints Rank-SVM for Multi-label Classification , 2014, CCPR.

[102]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[103]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[104]  Wenqi Liu,et al.  Imbalanced Multi-Modal Multi-Label Learning for Subcellular Localization Prediction of Human Proteins with Both Single and Multiple Sites , 2012, PloS one.

[105]  Rémi Gilleron,et al.  Learning Multi-label Alternating Decision Trees from Texts and Data , 2003, MLDM.

[106]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[107]  Sanjeev Sharma,et al.  An Investigation of Fuzzy PSO and Fuzzy SVD Based RBF Neural Network for Multi-label Classification , 2013, SocProS.