A New Approach for Improving Accuracy of Multi Label Stream Data

Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non-streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. Classification is used to predict class of unseen instance as accurate as possible. Multi label classification is a variant of single label classification where set of labels associated with single instance. Multi label classification is used by modern applications, such as text classification, functional genomics, image classification, music categorization etc. This paper introduces the task of multi-label classification, methods for multi-label classification and evolution measure for multi-label classification. Also done comparative analysis of multi label classification methods on the basis of theoretical study and then on the basis of simulation done on various data sets.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Grigorios Tsoumakas,et al.  Dealing with Concept Drift and Class Imbalance in Multi-Label Stream Classification , 2011, IJCAI.

[3]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[4]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[5]  A.N. Srivastava,et al.  Discovering recurring anomalies in text reports regarding complex space systems , 2005, 2005 IEEE Aerospace Conference.

[6]  Geoffrey Holmes,et al.  Mining data streams using option trees , 2003 .

[7]  Geoffrey Holmes,et al.  Efficient multi-label classification for evolving data streams , 2010 .

[8]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[9]  Eyke Hüllermeier,et al.  A Unified Model for Multilabel Classification and Ranking , 2006, ECAI.

[10]  Zhi-Hua Zhou,et al.  Multi-Label Learning by Instance Differentiation , 2007, AAAI.

[11]  Philip S. Yu,et al.  An ensemble-based approach to fast classification of multi-label data streams , 2011, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[12]  Dah-Jye Lee,et al.  Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).

[13]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[14]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[15]  Jesse Read,et al.  A Pruned Problem Transformation Method for Multi-label Classification , 2008 .

[16]  Nitesh V. Chawla,et al.  Adaptive Methods for Classification in Arbitrarily Imbalanced and Drifting Data Streams , 2009, PAKDD Workshops.

[17]  Grigorios Tsoumakas,et al.  Correlation-Based Pruning of Stacked Binary Relevance Models for Multi-Label Learning , 2009 .

[18]  David D. Lewis,et al.  Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.

[19]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[20]  Suzana Loskovska,et al.  A SURVEY OF STREAM DATA MINING , 2007 .

[21]  P. Chenna Reddy,et al.  Mining Data Streams using Option Trees , 2012 .

[22]  José Augusto Baranauskas,et al.  An Adaptation of Binary Relevance for Multi-Label Classification applied to Functional Genomics , 2012 .

[23]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[24]  Ruoming Jin,et al.  Efficient decision tree construction on streaming data , 2003, KDD '03.

[25]  Amit Thakkar,et al.  A Survey and Current Research Challenges in Multi-Label Classification Methods , 2012 .

[26]  Everton Alvares Cherman,et al.  Multi-label Problem Transformation Methods: a Case Study , 2011, CLEI Electron. J..

[27]  Geoff Holmes,et al.  Streaming Multi-label Classification , 2011, WAPA.

[28]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[29]  Latifur Khan,et al.  Facing the reality of data stream classification: coping with scarcity of labeled data , 2012, Knowledge and Information Systems.

[30]  João Gama,et al.  Issues in evaluation of stream learning algorithms , 2009, KDD.

[31]  Thomas Seidl,et al.  MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering , 2010, WAPA.

[32]  Eyke Hüllermeier,et al.  Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[33]  Koby Crammer,et al.  A Family of Additive Online Algorithms for Category Ranking , 2003, J. Mach. Learn. Res..

[34]  Yihong Gong,et al.  Multi-labelled classification using maximum entropy method , 2005, SIGIR '05.

[35]  Joachim M. Buhmann,et al.  Classification of Multi-labeled Data: A Generative Approach , 2008, ECML/PKDD.

[36]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[37]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[38]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[39]  Ricard Gavaldà,et al.  Mining adaptively frequent closed unlabeled rooted trees in data streams , 2008, KDD.

[40]  Eyke Hüllermeier,et al.  Combining Instance-Based Learning and Logistic Regression for Multilabel Classification , 2009, ECML/PKDD.

[41]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[42]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[43]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[44]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[45]  Li Guo,et al.  Mining Multi-Label Data Streams Using Ensemble-Based Active Learning , 2012, SDM.

[46]  Philip S. Yu,et al.  A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions , 2007, SDM.

[47]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.