Social Stream Classification with Emerging New Labels

As an important research topic with well-recognized practical values, classification of social streams has been identified with increasing popularity with social data, such as the tweet stream generated by Twitter users in chronological order. A salient, and perhaps also the most interesting, feature of such user-generated content is its never-failing novelty, which, unfortunately, would challenge most traditional pre-trained classification models as they are built based on fixed label set and would therefore fail to identify new labels as they emerge. In this paper, we study the problem of classification of social streams with emerging new labels, and propose a novel ensemble framework, integrating an instance-based learner and a label-based learner by completely-random trees. The proposed framework can not only classify known labels in the multi-label scenario, but also detect emerging new labels and update itself in the data stream. Extensive experiments on real-world stream data set from Weibo, a Chinese micro-blogging platform, demonstrate the superiority of our approach over the state-of-the-art methods.

[1]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2]  Feng Liu,et al.  MLRF: Multi-label Classification Through Random Forest with Label-Set Partition , 2015, ICIC.

[3]  Zhi-Hua Zhou,et al.  Learnware: on the future of machine learning , 2016, Frontiers of Computer Science.

[4]  Zhi-Hua Zhou,et al.  Multi-Label Learning with Emerging New Labels , 2018, IEEE Transactions on Knowledge and Data Engineering.

[5]  Kai Ming Ting,et al.  Maximizing Tree Diversity by Building Complete-Random Decision Trees , 2005, PAKDD.

[6]  Philip S. Yu,et al.  Is random model better? On its accuracy and efficiency , 2003, Third IEEE International Conference on Data Mining.

[7]  Latifur Khan,et al.  SAND: Semi-Supervised Adaptive Novel Class Detection and Classification over Data Stream , 2016, AAAI.

[8]  Zhi-Hua Zhou,et al.  Hybrid decision tree , 2002, Knowl. Based Syst..

[9]  David B. Dunson,et al.  Probabilistic topic models , 2012, Commun. ACM.

[10]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[11]  Zhi-Hua Zhou,et al.  Streaming Classification with Emerging New Class by Class Matrix Sketching , 2017, AAAI.

[12]  Alok N. Choudhary,et al.  Twitter Trending Topic Classification , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[13]  Charu C. Aggarwal,et al.  Stream Classification with Recurring and Novel Class Detection Using Class-Based Ensemble , 2012, 2012 IEEE 12th International Conference on Data Mining.

[14]  Mohamed Medhat Gaber,et al.  Knowledge discovery from data streams , 2009, IDA 2009.

[15]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[16]  Arkaitz Zubiaga,et al.  Real‐time classification of Twitter trends , 2014, J. Assoc. Inf. Sci. Technol..

[17]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[18]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[19]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[20]  Zhi-Hua Zhou,et al.  Multi-instance multi-label new label learning , 2018 .

[21]  Charu C. Aggarwal,et al.  Towards Classification of Social Streams , 2015, SDM.

[22]  Zhi-Hua Zhou,et al.  Classification Under Streaming Emerging New Classes: A Solution Using Completely-Random Trees , 2016, IEEE Transactions on Knowledge and Data Engineering.

[23]  Zhi-Hua Zhou,et al.  Multi-instance multi-label learning , 2008, Artif. Intell..

[24]  Philip S. Yu,et al.  An ensemble-based approach to fast classification of multi-label data streams , 2011, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[25]  Charu C. Aggarwal,et al.  Mining text and social streams: a review , 2014, SKDD.

[26]  Yang Yu,et al.  Learning with Augmented Class by Exploiting Unlabeled Data , 2014, AAAI.

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.