Prediction accuracy analysis with logistic regression and CART decision tree

Classification is one of the most important techniques in machine learning. In classification problems, logistic regression and decision tree are two efficient algorithms in supervised learning. In this paper, we tested logical regression and CART decision tree algorithms on different datasets. The results received from experiments showed that CART decision tree performs much better in data set with more attributes and slight imbalanced data distribution. At the same time logistic regression is more accurate on datasets with fewer attributes and balanced data distribution.

[1]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[2]  Xindong Wu,et al.  The Top Ten Algorithms in Data Mining , 2009 .

[3]  Matus Telgarsky,et al.  Risk and parameter convergence of logistic regression , 2018, ArXiv.

[4]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[5]  Hamid Parvin,et al.  Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification , 2018, Neurocomputing.

[6]  I-Cheng Yeh,et al.  Knowledge discovery on RFM model using Bernoulli sequence , 2009, Expert Syst. Appl..

[7]  Tian-Shyug Lee,et al.  Mining the customer credit using classification and regression tree and multivariate adaptive regression splines , 2006, Comput. Stat. Data Anal..

[8]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[9]  Max A. Little,et al.  Objective Automatic Assessment of Rehabilitative Speech Treatment in Parkinson's Disease , 2014, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[10]  Xiaodan Lv,et al.  Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications , 2017, Biomedical engineering online.

[11]  Anderson C. A. Nascimento,et al.  Efficient and Private Scoring of Decision Trees, Support Vector Machines and Logistic Regression Models Based on Pre-Computation , 2019, IEEE Transactions on Dependable and Secure Computing.

[12]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[13]  Paulo Cortez,et al.  Using data mining for bank direct marketing: an application of the CRISP-DM methodology , 2011 .

[14]  Jakub M. Tomczak,et al.  Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction , 2016, Expert Syst. Appl..

[15]  Veera Boonjing,et al.  Comparing performances of logistic regression, decision trees, and neural networks for classifying heart disease patients , 2010, 2010 International Conference on Computer Information Systems and Industrial Management Applications (CISIM).