Feature ranking for enhancing boosting-based multi-label text categorization

Abstract Boosting algorithms have been proved effective for multi-label learning. As ensemble learning algorithms, boosting algorithms build classifiers by composing a set of weak hypotheses. The high computational cost of boosting algorithms in learning from large volumes of data such as text categorization datasets is a real challenge. Most boosting algorithms, such as AdaBoost.MH, iteratively examine all training features to generate the weak hypotheses, which increases the learning time. RFBoost was introduced to manage this problem based on a rank-and-filter strategy in which it first ranks the training features and then, in each learning iteration, filters and uses only a subset of the highest-ranked features to construct the weak hypotheses. This step ensures accelerated learning time for RFBoost compared to AdaBoost.MH, as the weak hypotheses produced in each iteration are reduced to a very small number. As feature ranking is the core idea of RFBoost, this paper presents and investigates seven feature ranking methods (information gain, chi-square, GSS-coefficient, mutual information, odds ratio, F1 score, and accuracy) in order to improve RFBoost's performance. Moreover, an accelerated version of RFBoost, called RFBoost1, is also introduced. Rather than filtering a subset of the highest-ranked features, FBoost1 selects only one feature, based on its weight, to build a new weak hypothesis. Experimental results on four benchmark datasets for multi-label text categorization) Reuters-21578, 20-Newsgroups, OHSUMED, and TMC2007(demonstrate that among the methods evaluated for feature ranking, mutual information yields the best performance for RFBoost. In addition, the results prove that RFBoost statistically outperforms both RFBoost1 and AdaBoost.MH on all datasets. Finally, RFBoost1 proved more efficient than AdaBoost.MH, making it a better alternative for addressing classification problems in real-life applications and expert systems.

[1]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Sanjay Kumar Singh,et al.  Multimodal Retrieval using Mutual Information based Textual Query Reformulation , 2017, Expert Syst. Appl..

[3]  Eyke Hüllermeier,et al.  Combining Instance-Based Learning and Logistic Regression for Multilabel Classification , 2009, ECML/PKDD.

[4]  Balázs Kégl,et al.  Accelerating AdaBoost using UCB , 2009, KDD Cup.

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[6]  Abdur Rehman,et al.  Relative discrimination criterion - A novel feature ranking method for text data , 2015, Expert Syst. Appl..

[7]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[8]  Bassam Al-Salemi,et al.  Boosting algorithms with topic modeling for multi-label text categorization: A comparative empirical study , 2015, J. Inf. Sci..

[9]  Yaojin Lin,et al.  Feature selection based on quality of information , 2017, Neurocomputing.

[10]  Roberto Basili,et al.  Complex Linguistic Features for Text Classification: A Comprehensive Study , 2004, ECIR.

[11]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[12]  Shasha Wang,et al.  Deep feature weighting for naive Bayes and its application to text classification , 2016, Eng. Appl. Artif. Intell..

[13]  Xiangliang Zhang,et al.  An up-to-date comparison of state-of-the-art classification algorithms , 2017, Expert Syst. Appl..

[14]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[15]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[16]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.

[17]  Bassam Al-Salemi,et al.  BoWT: A Hybrid Text Representation Model for Improving Text Categorization Based on AdaBoost.MH , 2016, MIWAI.

[18]  Ahmad Baraani-Dastjerdi,et al.  Enriched LDA (ELDA): Combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction , 2017, Expert Syst. Appl..

[19]  Changsheng Xu,et al.  Boosted multi-class semi-supervised learning for human action recognition , 2011, Pattern Recognit..

[20]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Li Zhang,et al.  Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks , 2014, Expert Syst. Appl..

[23]  Yu Xue,et al.  A hybrid feature selection algorithm for gene expression data classification , 2017, Neurocomputing.

[24]  Eyke Hüllermeier,et al.  Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[25]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[26]  Mohammed Azmi Al-Betar,et al.  Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering , 2017, Expert Syst. Appl..

[27]  Jing Liu,et al.  Feature selection based on FDA and F-score for multi-class classification , 2017, Expert Syst. Appl..

[28]  A. Katrutsa,et al.  Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria , 2017, Expert Syst. Appl..

[29]  Aytug Onan,et al.  Ensemble of keyword extraction methods and classifiers in text classification , 2016, Expert Syst. Appl..

[30]  George Forman,et al.  A pitfall and solution in multi-class feature selection for text classification , 2004, ICML.

[31]  Haytham Elghazel,et al.  Ensemble multi-label text categorization based on rotation forest and latent semantic indexing , 2016, Expert Syst. Appl..

[32]  Alper Kursat Uysal,et al.  An improved global feature selection scheme for text classification , 2016, Expert Syst. Appl..

[33]  Lluís Màrquez i Villodre,et al.  Boosting Applied to Word Sense Disambiguation , 2000, ArXiv.

[34]  Jianhua Xu,et al.  An efficient multi-label support vector machine with a zero label , 2012, Expert Syst. Appl..

[35]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[36]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[37]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[38]  Bruno Trstenjak,et al.  on Intelligent Manufacturing and Automation , 2013 KNN with TF-IDF Based Framework for Text Categorization , 2014 .

[39]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[40]  Vili Podgorelec,et al.  Text classification method based on self-training and LDA topic models , 2017, Expert Syst. Appl..

[41]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[42]  Witold Pedrycz,et al.  Multi-label classification by exploiting label correlations , 2014, Expert Syst. Appl..

[43]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[44]  Bassam Al-Salemi,et al.  RFBoost: An improved multi-label boosting algorithm and its application to text categorisation , 2016, Knowl. Based Syst..

[45]  Balázs Kégl,et al.  A Robust Ranking Methodology Based on Diverse Calibration of AdaBoost , 2011, ECML/PKDD.

[46]  Bassam Al-Salemi,et al.  LDA-AdaBoost.MH: Accelerated AdaBoost.MH based on latent Dirichlet allocation for text categorization , 2015, J. Inf. Sci..

[47]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[48]  Gang Wang,et al.  Feature selection with conditional mutual information maximin in text categorization , 2004, CIKM '04.

[49]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[50]  Wenhao Shu,et al.  Mutual information criterion for feature selection from incomplete data , 2015, Neurocomputing.

[51]  Arun K. Pujari,et al.  Multi-label classification using hierarchical embedding , 2018, Expert Syst. Appl..

[52]  António Pacheco,et al.  Theoretical evaluation of feature selection methods based on mutual information , 2016, Neurocomputing.