Combining Modifications to Multinomial Naive Bayes for Text Classification

Multinomial Naive Bayes (MNB) is a preferred classifier for many text classification tasks, due to simplicity and trivial scaling to large scale tasks. However, in terms of classification accuracy it has a performance gap to modern discriminative classifiers, due to strong data assumptions. This paper explores the optimized combination of popular modifications to generative models in the context of MNB text classification. In order to optimize the introduced classifier metaparameters, we explore direct search optimization using random search algorithms. We evaluate 7 basic modifications and 4 search algorithms across 5 publicly availably available datasets, and give comparisons to similarly optimized Multiclass Support Vector Machine (SVM) classifiers. The use of optimized modifications results in over 20% mean reduction in classification errors compared to baseline MNB models, reducing the gap between SVM and MNB mean performance by over 60%. Some of the individual modifications are shown to have substantial and significant effects, while differences between the random search algorithms are smaller and not statistically significant. The evaluated modifications are potentially applicable to many applications of generative text modeling, where similar performance gains can be achieved.

[1]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[2]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[3]  Céline Rouveirol,et al.  Machine Learning: ECML-98 , 1998, Lecture Notes in Computer Science.

[4]  Anne Auger,et al.  Comparing results of 31 algorithms from the black-box optimization benchmarking BBOB-2009 , 2010, GECCO '10.

[5]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[6]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[7]  Jimmy J. Lin,et al.  A cascade ranking model for efficient ranked retrieval , 2011, SIGIR.

[8]  Carlos Cotta,et al.  Adaptive and multilevel metaheuristics , 2008 .

[9]  Ana Margarida de Jesus,et al.  Improving Methods for Single-label Text Categorization , 2007 .

[10]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[11]  Karl-Michael Schneider,et al.  Techniques for Improving the Performance of Naive Bayes for Text Classification , 2005, CICLing.

[12]  Geoff Holmes,et al.  Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.

[13]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[14]  M. J. D. Powell,et al.  Direct search algorithms for optimization calculations , 1998, Acta Numerica.

[15]  Chih-Jen Lin,et al.  A sequential dual method for large scale multi-class linear svms , 2008, KDD.

[16]  R. C. White A survey of random methods for parameter optimization , 1971 .

[17]  John D. Lafferty,et al.  Two-stage language models for information retrieval , 2002, SIGIR '02.

[18]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[19]  Xinghuo Yu,et al.  AI 2004: Advances in Artificial Intelligence, 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, December 4-6, 2004, Proceedings , 2004, Australian Conference on Artificial Intelligence.

[20]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[21]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[22]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[23]  Mauro Brunato,et al.  RASH: A Self-adaptive Random Search Method , 2008, Adaptive and Multilevel Metaheuristics.

[24]  Lillian Lee,et al.  IDF revisited: a simple new derivation within the Robertson-Spärck Jones probabilistic model , 2007, SIGIR.