Building an Ensemble of Fine-Tuned Naive Bayesian Classifiers for Text Classification

Text classification is one domain in which the naive Bayesian (NB) learning algorithm performs remarkably well. However, making further improvement in performance using ensemble-building techniques proved to be a challenge because NB is a stable algorithm. This work shows that, while an ensemble of NB classifiers achieves little or no improvement in terms of classification accuracy, an ensemble of fine-tuned NB classifiers can achieve a remarkable improvement in accuracy. We propose a fine-tuning algorithm for text classification that is both more accurate and less stable than the NB algorithm and the fine-tuning NB (FTNB) algorithm. This improvement makes it more suitable than the FTNB algorithm for building ensembles of classifiers using bagging. Our empirical experiments, using 16-benchmark text-classification data sets, show significant improvement for most data sets.

[1]  Constantine D. Spyropoulos,et al.  An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[2]  Khalil el Hindi,et al.  Selectively Fine-Tuning Bayesian Network Learning Algorithm , 2016, Int. J. Pattern Recognit. Artif. Intell..

[3]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[4]  LIANGXIAO JIANG,et al.  Discriminatively Weighted Naive Bayes and its Application in Text Classification , 2012, Int. J. Artif. Intell. Tools.

[5]  Cha Zhang,et al.  Ensemble Machine Learning: Methods and Applications , 2012 .

[6]  Khalil El Hindi,et al.  An Ensemble of Fine-Tuned Heterogeneous Bayesian Classifiers , 2016 .

[7]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[8]  Giovanni Seni,et al.  Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions , 2010, Ensemble Methods in Data Mining.

[9]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[10]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[11]  Khalil el Hindi,et al.  Fine tuning the Naïve Bayesian learning algorithm , 2014, AI Commun..

[12]  George Karypis,et al.  Centroid-Based Document Classification: Analysis and Experimental Results , 2000, PKDD.

[13]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[14]  Diab M. Diab,et al.  Using differential evolution for fine tuning naïve Bayesian classifiers and its application for text classification , 2017, Appl. Soft Comput..

[15]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[16]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[17]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Khalil El Hindi A noise tolerant fine tuning algorithm for the Naïve Bayesian learning algorithm , 2014, J. King Saud Univ. Comput. Inf. Sci..

[19]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[20]  LASSIFICATION,et al.  ENSEMBLES OF CLASSIFIERS FOR MORPHOLOGICAL GALAXY CLASSIFICATION , 2001 .

[21]  Albert Fornells,et al.  A study of the effect of different types of noise on the precision of supervised learning techniques , 2010, Artificial Intelligence Review.

[22]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[23]  Kai Ming Ting,et al.  A Study of AdaBoost with Naive Bayesian Classifiers: Weakness and Improvement , 2003, Comput. Intell..

[24]  Diab M. Diab,et al.  Using differential evolution for improving distance measures of nominal values , 2018, Appl. Soft Comput..

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  Dirk Van den Poel,et al.  Random Multiclass Classification: Generalizing Random Forests to Random MNL and Random NB , 2007, DEXA.

[27]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[28]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[29]  Liangxiao Jiang,et al.  Naive Bayes text classifiers: a locally weighted learning approach , 2013, J. Exp. Theor. Artif. Intell..

[30]  Khalil el Hindi,et al.  Specific-class distance measures for nominal attributes , 2013, AI Commun..

[31]  Sven Schmeier,et al.  Message Classification in the Call Center , 2000, ANLP.

[32]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[33]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.