Performance Evaluation of Classification Algorithms on Different Data Sets

Objectives: The most appropriate classifier selections for the particular data sets were generally found harder. Therefore, in this study various existing classifiers have been considered on several data sets to assess their performance. Methods/ Statistical Analysis: Usually, the selections of classification techniques, such as, Naive Bayes (NB), Decision Tree (DT), Lazy Classifiers (LC), Support Vector Machine, etc., depend on the type and nature of the attributes in the data set. The wrong selection of classification technique can certainly lead to wrong results and poor performance. This concept is the motivation behind this study. Usually the data set consists of nominal attributes, numeric attributes or mix attributes (both numeric and nominal attribute). In this paper, different types of data sets are applied on three most popular classification techniques, such as, NB, DT, and LC, to evaluate their performances. Findings: The result reveals that NB classifier performs well on both mix attribute data and numeric data but decision tree classifier performs better on nominal attribute data. Lazy classifier’s performance is just average for all kind of data. Application/Improvements: The results of this study will helps in understanding the performance of different classification techniques on different data sets. Further, results can be utilized to select the best classification technique among NB, decision tree and lazy classifiers in order to use with different data sets.

[1]  Manuel Filipe Santos,et al.  Closing the Gap between Data Mining and Business Users of Business Intelligence Systems: A Design Science Approach , 2012, Int. J. Bus. Intell. Res..

[2]  Rudolf Kruse,et al.  Obtaining interpretable fuzzy classification rules from medical data , 1999, Artif. Intell. Medicine.

[3]  Manoranjan Parida,et al.  A comparative analysis of heterogeneity in road accident data using data mining techniques , 2016, Evolving Systems.

[4]  Seyed Ahad Zolfagharifar,et al.  Developing a Hybrid Intelligent Classifier by using Evolutionary Learning (Genetic Algorithm and Decision Tree) , 2016 .

[5]  Durga Toshniwal,et al.  Analysing road accident data using association rule mining , 2015, 2015 International Conference on Computing, Communication and Security (ICCCS).

[6]  M-Tech Student,et al.  A Survey on Decision Tree Based Approaches in Data Mining , 2015 .

[7]  Durga Toshniwal,et al.  A novel framework to analyze road accident time series data , 2016, Journal of Big Data.

[8]  Sachin Kumar,et al.  A data mining approach to characterize road accident locations , 2016, Journal of Modern Transportation.

[9]  Durga Toshniwal,et al.  Analysis of hourly road accident counts using hierarchical clustering and cophenetic correlation coefficient (CPCC) , 2016, Journal of Big Data.

[10]  Nenad Jukic,et al.  Modeling-Centered Data Warehousing Learning: Methods, Concepts and Resources , 2012, Int. J. Bus. Intell. Res..

[11]  Durga Toshniwal,et al.  A data mining framework to analyze road accident data , 2015, Journal of Big Data.

[12]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[13]  S. Vijayarani,et al.  Comparative Analysis of Bayes and Lazy Classification Algorithms , 2013 .

[14]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[15]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[16]  Rupali Bhardwaj,et al.  Implementation of ID3 Algorithm , 2013 .

[17]  Lior Rokach,et al.  Data Mining and Knowledge Discovery Handbook, 2nd ed , 2010, Data Mining and Knowledge Discovery Handbook, 2nd ed..

[18]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[19]  Chun-Jong Kim,et al.  Factors Associated with Decision to Participate in Physical Activity by People with Spinal Cord Injury: An Analysis using Decision Tree , 2016 .

[20]  V. Rajalakshmi,et al.  Anonymization by Data Relocation Using Sub-clustering for Privacy Preserving Data Mining , 2014 .

[21]  Vijay Kumar Jha,et al.  Data Mining based Hybrid Intrusion Detection System , 2014 .

[22]  Pasi Luukka,et al.  Similarity classifier using similarity measure derived from Yu's norms in classification of medical data sets , 2007, Comput. Biol. Medicine.