论文信息 - Classification optimization for training a large dataset with Naïve Bayes

Classification optimization for training a large dataset with Naïve Bayes

Book classification is very popular in digital libraries. Book rating prediction is crucial to improve the care of readers. The commonly used techniques are decision tree, Naïve Bayes (NB), neural networks, etc. Moreover, mining book data depends on feature selection, data pre-processing, and data preparation. This paper proposes the solutions of knowledge representation optimization as well as feature selection to enhance book classification and point out appropriate classification algorithms. Several experiments have been conducted and it has been found that NB could provide best prediction results. The accuracy and performance of NB can be improved and outperform other classification algorithms by applying appropriate strategies of feature selections, data type selection as well as data transformation.

Thi Thanh Sang Nguyen | Pham Minh Thu Do | P. M. T. Do

[1] I. Witten. Chapter 2 – Input: Concepts, Instances, and Attributes , 2011 .

[2] Thi Thanh Sang Nguyen. Model-Based Book Recommender Systems using Naïve Bayes enhanced with Optimal Feature Selection , 2019, ICSCA.

[3] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[4] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[5] Jian Pei,et al. Classification: Advanced Methods , 2012 .

[6] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7] Alejandro Sobrino,et al. Designing a system to extract and interpret timed causal sentences in medical reports , 2018, J. Exp. Theor. Artif. Intell..

[8] Roger G. Stone,et al. Naive Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages , 2009 .

[9] Dimitrios Gunopulos,et al. Feature selection for the naive bayesian classifier using decision trees , 2003, Appl. Artif. Intell..

[10] Ian H. Witten. Chapter 7 – Data Transformations , 2011 .

[11] Jiawei Han,et al. 3 – Data Preprocessing , 2012 .