Classification optimization for training a large dataset with Naïve Bayes

Book classification is very popular in digital libraries. Book rating prediction is crucial to improve the care of readers. The commonly used techniques are decision tree, Naïve Bayes (NB), neural networks, etc. Moreover, mining book data depends on feature selection, data pre-processing, and data preparation. This paper proposes the solutions of knowledge representation optimization as well as feature selection to enhance book classification and point out appropriate classification algorithms. Several experiments have been conducted and it has been found that NB could provide best prediction results. The accuracy and performance of NB can be improved and outperform other classification algorithms by applying appropriate strategies of feature selections, data type selection as well as data transformation.

[1]  I. Witten Chapter 2 – Input: Concepts, Instances, and Attributes , 2011 .

[2]  Thi Thanh Sang Nguyen Model-Based Book Recommender Systems using Naïve Bayes enhanced with Optimal Feature Selection , 2019, ICSCA.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[5]  Jian Pei,et al.  Classification: Advanced Methods , 2012 .

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Alejandro Sobrino,et al.  Designing a system to extract and interpret timed causal sentences in medical reports , 2018, J. Exp. Theor. Artif. Intell..

[8]  Roger G. Stone,et al.  Naive Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages , 2009 .

[9]  Dimitrios Gunopulos,et al.  Feature selection for the naive bayesian classifier using decision trees , 2003, Appl. Artif. Intell..

[10]  Ian H. Witten Chapter 7 – Data Transformations , 2011 .

[11]  Jiawei Han,et al.  3 – Data Preprocessing , 2012 .

[12]  Nuria Oliver,et al.  Data Mining Methods for Recommender Systems , 2015, Recommender Systems Handbook.

[13]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[14]  Musa A. Mammadov,et al.  Learning the naive Bayes classifier with optimization models , 2013, Int. J. Appl. Math. Comput. Sci..

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Ian H. Witten Chapter 5 – Credibility: Evaluating What's Been Learned , 2011 .

[17]  Hongbo Shi,et al.  Naïve Bayes vs. Support Vector Machine: Resilience to Missing Data , 2011, AICI.

[18]  Liangxiao Jiang,et al.  An attribute value frequency-based instance weighting filter for naive Bayes , 2018, J. Exp. Theor. Artif. Intell..

[19]  Jian Pei,et al.  Getting to Know Your Data , 2019, An R Companion for the Third Edition of The Fundamentals of Political Science Research.