Research and Implementation of a Multi-label Learning Algorithm for Chinese Text Classification

Multi-label learning has received significant attention in the research community over the past few years. Traditional supervised learning techniques do not fit it well, as real-world objects might be complicated and have multiple semantic meanings simultaneously. In our work, we set our goals to mine the involved product attributes in comment data from JD.com. This task is fundamental and significant to businesses for studying the online market feedbacks from consumers. In this paper, we formally define the three types of text categorization problems and analyze the relations among them. Then, we assign some single-label multiclass classifiers to the new training datasets which are created by our constructing algorithms. Thus, multilabel learning is transformed into a series of single-label multi-class binary classification problems: whether an unseen instance belongs to a certain class or not. Finally, we assemble the outputs of all single-label multi-class classifiers to obtain the multiple labels. In the end of this paper, we conducted comprehensive experiments to evaluate the performance of our proposed algorithms.

[1]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[2]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[3]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[4]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[5]  Bassam Al-Salemi,et al.  RFBoost: An improved multi-label boosting algorithm and its application to text categorisation , 2016, Knowl. Based Syst..

[6]  Keh-Jiann Chen,et al.  Word Identification for Mandarin Chinese Sentences , 1992, COLING.

[7]  Wu Zong-min,et al.  Radial Basis Function Scattered Data Interpolation and the Meshless Method of Numerical Solution of PDEs , 2002 .

[8]  Yiming Yang,et al.  Multilabel classification with meta-level features , 2010, SIGIR.

[9]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[10]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[11]  Eisaku Maeda,et al.  Maximal Margin Labeling for Multi-Topic Text Categorization , 2004, NIPS.

[12]  Antonino Feitosa Neto,et al.  A Comparative Analysis of Classification Methods to Multi-label Tasks in Different Application Domains , 2011 .

[13]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[14]  Lei Tang,et al.  Large scale multi-label classification via metalabeler , 2009, WWW '09.

[15]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[16]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[17]  Yihong Gong,et al.  Multi-labelled classification using maximum entropy method , 2005, SIGIR '05.

[18]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  Chih-Jen Lin,et al.  Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..