Integrating associative rule-based classification with Naïve Bayes for text classification

Abstract Associative classification (AC) integrates the task of mining association rules with the classification task to increase the efficiency of the classification process. AC algorithms produce accurate classification and generate easy to understand rules. However, AC algorithms suffer from two drawbacks: the large number of classification rules, and using different pruning methods that may remove vital information to achieve the right decision. In this paper, a new hybrid AC algorithm (HAC) is proposed. HAC applies the power of the Naive Bayes (NB) algorithm to reduce the number of classification rules and to produce several rules that represent each attribute value. Two experiments are conducted on an Arabic textual dataset and the standard Reuters-21578 datasets using six different algorithms, namely J48, NB, classification based on associations (CBA), multi-class classification based on association rules (MCAR), expert multi-class classification based on association rules (EMCAR), and fast associative classification algorithm (FACA). The results of the experiments showed that the HAC approach produced higher classification accuracy than MCAR, CBA, EMCAR, FACA, J48 and NB with gains of 3.95%, 6.58%, 3.48%, 1.18%, 5.37% and 8.05% respectively. Furthermore, on Reuters-21578 datasets, the results indicated that the HAC algorithm has an excellent and stable performance in terms of classification accuracy and F measure.

[1]  Abdullah S. Ghareb,et al.  An Approach for Arabic Text Categorization Using Association Rule Mining , 2011, Int. J. Comput. Process. Orient. Lang..

[2]  Zaixiang Huang,et al.  Resolving Rule Conflicts Based on Naïve Bayesian Model for Associative Classification , 2014, J. Digit. Inf. Manag..

[3]  Ondrej Krejcar,et al.  Modified frequency-based term weighting schemes for text classification , 2017, Appl. Soft Comput..

[4]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  Wa'el Hadi EMCAR: Expert Multi Class Based on Association Rule , 2013 .

[7]  Jaber Alwedyan,et al.  Categorize arabic data sets using multi-class classification based on association rule approach , 2011, ISWSA '11.

[8]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Aladdin Ayesh,et al.  Multi-Label Rules Algorithm Based Associative Classification , 2014, Parallel Process. Lett..

[11]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[12]  Wa’el Hadi,et al.  ACPRISM: Associative classification based on PRISM algorithm , 2017, Inf. Sci..

[13]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[14]  Qasem A. Al-Radaideh,et al.  A Multi-Label Classification Approach Based on Correlations Among Labels , 2015 .

[15]  Nijad Al-Najdawi,et al.  ACNB: Associative Classification Mining Based on Naïve Bayesian Method , 2013, Int. J. Inf. Technol. Web Eng..

[16]  Qasem A. Al-Radaideh,et al.  An associative rule-based classifier for Arabic medical text , 2015, Int. J. Knowl. Eng. Data Min..

[17]  Chih-Fong Tsai,et al.  Building an associative classifier with multiple minimum supports , 2016, SpringerPlus.

[18]  Wa’el Hadi,et al.  A new fast associative classification algorithm for detecting phishing websites , 2016, Appl. Soft Comput..

[19]  Peter I. Cowling,et al.  MCAR: multi-class classification based on association rule , 2005, The 3rd ACS/IEEE International Conference onComputer Systems and Applications, 2005..

[20]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[21]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[22]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[23]  Neda Abdelhamid,et al.  Multi-label rules for phishing classification , 2015 .

[24]  Arif Ali Khan,et al.  Software design patterns classification and selection using text categorization approach , 2017, Appl. Soft Comput..

[25]  Jian Yin,et al.  Mining Correlated Rules for Associative Classification , 2012, ADMA.

[26]  Brian Moon,et al.  Automated text classification using a dynamic artificial neural network model , 2012, Expert Syst. Appl..

[27]  Witold Pedrycz,et al.  Efficient mining of class association rules with the itemset constraint , 2016, Knowl. Based Syst..

[28]  P. Burman A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods , 1989 .

[29]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[30]  David D. Lewis,et al.  Evaluating Text Categorization I , 1991, HLT.

[31]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[32]  Hyunki Kim,et al.  Associative Naïve Bayes classifier: Automated linking of gene ontology to medline documents , 2009, Pattern Recognit..

[33]  Prachitee B. Shekhawat,et al.  A Classification Technique using Associative Classification , 2011 .

[34]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[35]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[36]  Jeremy J. Eberhardt Bayesian Spam Detection , 2015 .

[37]  Huan-Chao Keh,et al.  Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values , 2010, Knowl. Based Syst..

[38]  Abdulmohsen Al-Thubaity,et al.  KACST Arabic Text Classification Project: Overview and Preliminary Results , 2008 .

[39]  Fadi A. Thabtah,et al.  Prediction Phase in Associative Classification Mining , 2011, Int. J. Softw. Eng. Knowl. Eng..

[40]  Xiaofeng Wang,et al.  An approach for adaptive associative classification , 2011, Expert Syst. Appl..