A Composite Likelihood View for Multi-Label Classification

Given limited training samples, learning to classify multiple labels is challenging. Problem decomposition [24] is widely used in this case, where the original problem is decomposed into a set of easier-to-learn subproblems, and predictions from subproblems are combined to make the final decision. In this paper we show the connection between composite likelihoods [17] and many multilabel decomposition methods, e.g., one-vs-all, one-vs-one, calibrated label ranking, probabilistic classifier chain. This connection holds promise for improving problem decomposition in both the choice of subproblems and the combination of subproblem decisions. As an attempt to exploit this connection, we design a composite marginal method that improves pairwise decomposition. Pairwise label comparisons, which seem to be a natural choice for subproblems, are replaced by bivariate label densities, which are more informative and natural components in a composite likelihood. For combining subproblem decisions, we propose a new mean-field approximation that minimizes the notion of composite divergence and is potentially more robust to inaccurate estimations in subproblems. Empirical studies on five data sets show that, given limited training samples, the proposed method outperforms many alternatives.

[1]  Geert Verbeke,et al.  Pairwise Fitting of Mixed Models for the Joint Modeling of Multivariate Longitudinal Profiles , 2006, Biometrics.

[2]  Nils Lid Hjort,et al.  ML, PL, QL in Markov Chain Models , 2008 .

[3]  K. Mardia,et al.  Maximum likelihood estimation using composite likelihoods for closed exponential families , 2009 .

[4]  D. Cox The Analysis of Multivariate Binary Data , 1972 .

[5]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[6]  Concha Bielza,et al.  Bayesian Chain Classifiers for Multidimensional Classification , 2011, IJCAI.

[7]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[8]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[9]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[10]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[11]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[12]  Kanti V. Mardia,et al.  A multivariate von mises distribution with applications to bioinformatics , 2008 .

[13]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[14]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[15]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[16]  Bruce G. Lindsay,et al.  ISSUES AND STRATEGIES IN THE SELECTION OF COMPOSITE LIKELIHOODS , 2011 .

[17]  David J. Nott,et al.  A pairwise likelihood approach to analyzing correlated binary data , 2000 .

[18]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[19]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[20]  D. Cox,et al.  A note on pseudolikelihood constructed from marginal densities , 2004 .

[21]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[22]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[23]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[24]  G. Molenberghs,et al.  Models for Discrete Longitudinal Data , 2005 .

[25]  Carlo Gaetan,et al.  Composite likelihood methods for space-time data , 2006 .

[26]  Eyke Hüllermeier,et al.  Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[27]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[28]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.