Accuracies and Training Times of Data Mining Classification Algorithms: An Empirical Comparative Study

Two important performance indicators for data mining algorithms are accuracy of classification/ prediction and time taken for training. These indicators are useful for selecting best algorithms for classification/prediction tasks in data mining. Empirical studies on these performance indicators in data mining are few. Therefore, this study was designed to determine how data mining classification algorithm perform with increase in input data sizes. Three data mining classification algorithms—Decision Tree, Multi-Layer Perceptron (MLP) Neural Network and Naive Bayes— were subjected to varying simulated data sizes. The time taken by the algorithms for trainings and accuracies of their classifications were analyzed for the different data sizes. Results show that Naive Bayes takes least time to train data but with least accuracy as compared to MLP and Decision Tree algorithms.

[1]  Soni Jyoti,et al.  Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction , 2011 .

[2]  Berkin Özisikyilmaz,et al.  An Architectural Characterization Study of Data Mining and Bioinformatics Workloads , 2006, 2006 IEEE International Symposium on Workload Characterization.

[3]  A. Muthukumaravel,et al.  A Study on Analysis of Various Datamining Classification Techniques on Healthcare Data , 2013 .

[4]  C. K. Bhensdadia,et al.  Improved Decision Tree Induction Algorithm with Feature Selection , Cross Validation , Model Complexity and Reduced Error Pruning , 2012 .

[5]  Guru Nanak,et al.  Decision Tree Induction Approach for Data Classification Using Peano Count Trees , 2012 .

[6]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[7]  Herbert A. Edelstein,et al.  Scalable data mining , 1997 .

[8]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[9]  Huidong Jin,et al.  Scalable model-based clustering algorithms for large databases and their applications , 2002 .

[10]  Roger G. Stone,et al.  Naive Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages , 2009 .

[11]  Anshul Goyal,et al.  Performance Comparison of Naïve Bayes and J 48 Classification Algorithms , 2012 .

[12]  Syeda Farha Shazmeen,et al.  Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis , 2013 .

[13]  Nitin,et al.  A Benchmark to Select Data Mining Based Classification Algorithms For Business Intelligence And Decision Support Systems , 2012, ArXiv.

[14]  Suresh B. Mudunuri,et al.  Performance Analysis and Evaluation of Different Data Mining Algorithms used for Cancer Classification , 2013 .

[15]  Wei-keng Liao,et al.  Performance evaluation and characterization of scalable data mining algorithms , 2004 .

[16]  K. S. Thirunavukkarasu,et al.  Analysis of Classification Techniques in Data Mining , 2013 .

[17]  Suresh Babu Changalasetty,et al.  OPTIMUM LEARNING RATE FOR CLASSIFICATION PROBLEM WITH MLP IN DATA MINING , 2013 .

[18]  Michael J. A. Berry,et al.  Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management , 2004 .

[19]  Raj Kumar,et al.  Classification Algorithms for Data Mining: a Survey , 2022 .

[20]  Faisal Kabir,et al.  Enhanced Classification Accuracy on Naive Bayes Data Mining Models , 2011 .

[21]  Nikhil N. Salvithal,et al.  Evaluating Performance of Data Mining Classification Algorithm in Weka , 2013 .

[22]  Saurabh Pal,et al.  Mining Educational Data to Analyze Students' Performance , 2012, ArXiv.