A Deep Interpretation of Classifier Chains

In the “classifier chains” (CC) approach for multi-label classification, the predictions of binary classifiers are cascaded along a chain as additional features. This method has attained high predictive performance, and is receiving increasing analysis and attention in the recent multi-label literature, although a deep understanding of its performance is still taking shape. In this paper, we show that CC gets predictive power from leveraging labels as additional stochastic features, contrasting with many other methods, such as stacking and error correcting output codes, which use label dependence only as kind of regularization. CC methods can learn a concept which these cannot, even supposing the same base classifier and hypothesis space. This leads us to connections with deep learning (indeed, we show that CC is competitive precisely because it is a deep learner), and we employ deep learning methods – showing that they can supplement or even replace a classifier chain. Results are convincing, and throw new insight into promising future directions.

[1]  Luca Martino,et al.  Efficient monte carlo methods for multi-dimensional learning with classifier chains , 2012, Pattern Recognit..

[2]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[3]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[4]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[5]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[6]  Eyke Hüllermeier,et al.  An Analysis of Chaining in Multi-Label Classification , 2012, ECAI.

[7]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[8]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[9]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[10]  Rayid Ghani,et al.  Using Error-Correcting Codes for Text Classification , 2000, ICML.

[11]  Jesse Read,et al.  A distributed particle filter for nonlinear tracking in wireless sensor networks , 2014, Signal Process..

[12]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[13]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[14]  Charles Elkan,et al.  Learning and Inference in Probabilistic Classifier Chains with Beam Search , 2012, ECML/PKDD.

[15]  Jesse Read,et al.  A Deep Interpretation of Classifier Chains , 2014, IDA.

[16]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[17]  Eyke Hüllermeier,et al.  Combining Instance-Based Learning and Logistic Regression for Multilabel Classification , 2009, ECML/PKDD.

[18]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[19]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[20]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[21]  W. T. Miller,et al.  CMAC: an associative neural network alternative to backpropagation , 1990, Proc. IEEE.

[22]  Concha Bielza,et al.  Bayesian Chain Classifiers for Multidimensional Classification , 2011, IJCAI.

[23]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.