Multi-label classification via multi-target regression on data streams

Multi-label classification is becoming more and more critical in data mining applications. Many efficient methods exist in the classical batch setting, however, in the streaming setting, comparatively few methods exist. In this paper, we propose a new methodology for multi-label classification via multi-target regression in a streaming setting and develop a streaming multi-target regressor iSOUP-Tree, which uses this approach. We experimentally evaluated two variants of the iSOUP-Tree algorithm, and determined that the use of regression trees is advisable over the use model trees. Furthermore, we compared our results to the state-of-the-art and found that the iSOUP-Tree method is comparable to the other streaming multi-label learners. This is a motivation for the potential use of iSOUP-Tree in an ensemble setting as a base learner.

[1]  Eyke Hüllermeier,et al.  Combining Instance-Based Learning and Logistic Regression for Multilabel Classification , 2009, ECML/PKDD.

[2]  Saso Dzeroski,et al.  Incremental multi-target model trees for data streams , 2011, SAC.

[3]  Saso Dzeroski,et al.  Learning model trees from evolving data streams , 2010, Data Mining and Knowledge Discovery.

[4]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[5]  Grigorios Tsoumakas,et al.  Dealing with Concept Drift and Class Imbalance in Multi-Label Stream Classification , 2011, IJCAI.

[6]  Celine Vens,et al.  Labelling strategies for hierarchical multi-label classification techniques , 2016, Pattern Recognit..

[7]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[8]  Hai Zhao,et al.  Drift Detection for Multi-label Data Streams Based on Label Grouping and Entropy , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[9]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[10]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[11]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[12]  João Gama,et al.  Decision trees for mining data streams , 2006, Intell. Data Anal..

[13]  Jesse Read,et al.  A Pruned Problem Transformation Method for Multi-label Classification , 2008 .

[14]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[15]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[16]  Geoff Holmes,et al.  Scalable and efficient multi-label classification for evolving data streams , 2012, Machine Learning.

[17]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[18]  Saso Dzeroski,et al.  Online tree-based ensembles and option trees for regression on evolving data streams , 2015, Neurocomputing.

[19]  Saso Dzeroski,et al.  Stepwise Induction of Multi-target Model Trees , 2007, ECML.

[20]  Grigorios Tsoumakas,et al.  Multi-target regression via input space expansion: treating targets as inputs , 2012, Machine Learning.

[21]  Yang Zhang,et al.  Mining Multi-label Concept-Drifting Data Streams Using Dynamic Classifier Ensemble , 2009, ACML.

[22]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[24]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[25]  Saso Dzeroski,et al.  Multi-label classification via multi-target regression on data streams , 2016, Machine Learning.

[26]  Saso Dzeroski,et al.  Constraint Based Induction of Multi-objective Regression Trees , 2005, KDID.

[27]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.

[28]  Sebastián Ventura,et al.  A Tutorial on Multilabel Learning , 2015, ACM Comput. Surv..

[29]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[30]  GamaJoão,et al.  Adaptive Model Rules From High-Speed Data Streams , 2016 .

[31]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[32]  Guoyong Cai,et al.  Efficient class incremental learning for multi-label classification of evolving data streams , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[33]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[34]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[35]  Christophe Moulin,et al.  MCut: A Thresholding Strategy for Multi-label Classification , 2012, IDA.

[36]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the McDiarmid's Bound , 2013, IEEE Transactions on Knowledge and Data Engineering.

[37]  Alex Alves Freitas,et al.  A Genetic Algorithm for Optimizing the Label Ordering in Multi-label Classifier Chains , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[38]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[39]  Eyke Hüllermeier,et al.  IBLStreams: a system for instance-based classification and regression on data streams , 2012, Evol. Syst..

[40]  Yong Wang,et al.  Using Model Trees for Classification , 1998, Machine Learning.

[41]  João Gama,et al.  Multi-target regression from high-speed data streams with adaptive model rules , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).