Comparing Decision Tree Method Over Three Data Mining Software

As a result of the growing IT and producing methods and collecting data, it is admitted that today the data can be warehoused faster in comparison with the past. Therefore,  knowledge discovery tools are required in order to make use of data mining. Data mining is typically employed as an advanced tool for analyzing the data and knowledge discovery. Indeed, the purpose of data mining is to establish models for decision. These models have the ability to predict the future treatments according to the past analysis and are of the exciting areas of machine learning and adaptive computation. Statistical analysis of the data uses a combination of techniques and artificial intelligence algorithms and data quality information. To utilize the data mining applications, including the commercial and open source applications, numerous programs are currently available. In this research, we introduce data mining and principal concepts of the decision tree method which are the most effective and widely used classification methods. In addition, a succinct description of the three data mining software, namely \textit{SPSS-Clementine}, \textit{RapidMiner} and \textit{Weka} is also provided. Afterwards, a comparison was performed on 3515 real datasets in terms of classification accuracy between the three different decision tree algorithms in order to illustrate the procedure of this research. The most accurate decision tree algorithm is \emph{Decision Tree} by 92.49\% in \emph{Rapidminer}.

[1]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[2]  Ian H. Witten,et al.  Data Mining: Practical Machine Learning Tools and Techniques, 3/E , 2014 .

[3]  Nikita Patel,et al.  Study of Various Decision Tree Pruning Methods with their Empirical Comparison in WEKA , 2012 .

[4]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[5]  Manpreet Singh,et al.  Performance Analysis of Decision Trees , 2013 .

[6]  Ricardo Fraiman,et al.  Interpretable clustering using unsupervised binary trees , 2011, Advances in Data Analysis and Classification.

[7]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[8]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  V. Ramesh,et al.  Predicting Student Performance: A Statistical and Data Mining Approach , 2013 .

[11]  John Nisbet,et al.  Predicting student performance , 1966 .

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  Ashwin Satyanarayana,et al.  Teaching Data Mining in the Era of Big Data , 2013 .

[14]  R. Put,et al.  The use of CART and multivariate regression trees for supervised and unsupervised feature selection , 2005 .

[15]  R. R.Kabra,et al.  Performance Prediction of Engineering Students using Decision Trees , 2011 .

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[17]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[18]  Jay Gholap,et al.  Performance Tuning Of J48 Algorithm For Prediction Of Soil Fertility , 2012, ArXiv.

[19]  Markus Hofmann,et al.  RapidMiner: Data Mining Use Cases and Business Analytics Applications , 2013 .

[20]  R. S. Bichkar,et al.  Performance Prediction of Engineering Students using Decision Trees , 2011 .