Influence of Data Discretization on Efficiency of Bayesian Classifier for Authorship Attribution

Abstract Authorship attribution is one of the research areas in data mining domain and various methods can be employed for performing that task. The paper presents results of research on influence of data discretization on efficiency of Naive Bayes classifier. The analysis has been carried on datasets founded on texts of two male and two female authors using the WEKA data mining software framework. The binary classification was performed separately for both datasets for wide range of parameters of discretization process in order to investigate dependency between ways of discretization and quality of classification using Naive Bayes method. The numerical results of tests have been compared and discussed and some observations and conclusions formulated.

[1]  Neelam Sharma,et al.  INTRUSION DETECTION USING NAIVE BAYES CLASSIFIER WITH FEATURE REDUCTION , 2012 .

[2]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[3]  Efstathios Stamatatos,et al.  Automatic Authorship Attribution , 1999, EACL.

[4]  Michael Stonebraker,et al.  The Morgan Kaufmann Series in Data Management Systems , 1999 .

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  S. Kotsiantis,et al.  Discretization Techniques: A recent survey , 2006 .

[7]  B. Chandra,et al.  Robust approach for estimating probabilities in Naïve-Bayes Classifier for gene expression data , 2011, Expert Syst. Appl..

[8]  Rasiah Loganantharaj,et al.  Extensions of Naive Bayes and Their Applications to Bioinformatics , 2007, ISBRA.

[9]  Efstathios Stamatatos,et al.  Computer-Based Authorship Attribution Without Lexical Measures , 2001, Comput. Humanit..

[10]  Karl-Michael Schneider,et al.  Techniques for Improving the Performance of Naive Bayes for Text Classification , 2005, CICLing.

[11]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[12]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[13]  Urszula Stanczyk,et al.  Decision rule length as a basis for evaluation of attribute relevance , 2013, J. Intell. Fuzzy Syst..

[14]  Nachol Chaiyaratana,et al.  Classification of complete blood count and haemoglobin typing data by a C4.5 decision tree, a naïve Bayes classifier and a multilayer perceptron for thalassaemia screening , 2012, Biomed. Signal Process. Control..

[15]  Vedat Verter,et al.  Predicting the Need for Ct Imaging in Children with Minor Head Injury Using an Ensemble of Naive Bayes Classifiers , 2022 .

[16]  Hae-Chang Rim,et al.  Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[17]  Urszula Stanczyk Establishing Relevance of Characteristic Features for Authorship Attribution with ANN , 2013, DEXA.

[18]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[19]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[20]  Urszula Stańczyk,et al.  Rough Set and Artificial Neural Network Approach to Computational Stylistics , 2013 .

[21]  Liangxiao Jiang,et al.  A Novel Bayes Model: Hidden Naive Bayes , 2009, IEEE Transactions on Knowledge and Data Engineering.

[22]  Shlomo Argamon,et al.  Computational methods in authorship attribution , 2009 .

[23]  Shahram Sarkani,et al.  A network intrusion detection system based on a Hidden Naïve Bayes multiclass classifier , 2012, Expert Syst. Appl..

[24]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[25]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[26]  Mauricio A. Valle,et al.  Job performance prediction in a call center using a naive Bayes classifier , 2012, Expert Syst. Appl..

[27]  V. Sugumaran,et al.  A comparative study of Naïve Bayes classifier and Bayes net classifier for fault diagnosis of monoblock centrifugal pump using wavelet analysis , 2012, Appl. Soft Comput..

[28]  Myong Kee Jeong,et al.  Class dependent feature scaling method using naive Bayes classifier for text datamining , 2009, Pattern Recognit. Lett..

[29]  Li Zhang,et al.  Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks , 2014, Expert Syst. Appl..