An up-to-date comparison of state-of-the-art classification algorithms

Up-to-date report on the accuracy and efficiency of state-of-the-art classifiers.We compare the accuracy of 11 classification algorithms pairwise and groupwise.We examine separately the training, parameter-tuning, and testing time.GBDT and Random Forests yield highest accuracy, outperforming SVM.GBDT is the fastest in testing, Naive Bayes the fastest in training. Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[3]  Paolo Giudici,et al.  Applied Data Mining for Business and Industry , 2009 .

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Hua Wang,et al.  A Comparative Study of Classification Methods For Microarray Data Analysis , 2006, AusDM.

[6]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[7]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  G. Yule On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c , 1900 .

[10]  J. Friedman Stochastic gradient boosting , 2002 .

[11]  Loris Nanni,et al.  High performance set of PseAAC and sequence based descriptors for protein classification. , 2010, Journal of theoretical biology.

[12]  David Johnstone,et al.  An empirical evaluation of the performance of binary classifiers in the prediction of credit ratings changes , 2015 .

[13]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[14]  Rasmus Berg Palm,et al.  Prediction as a candidate for learning deep hierarchical models of data , 2012 .

[15]  Ivan Bratko,et al.  Information-Based Evaluation Criterion for Classifier's Performance , 1991, Machine Learning.

[16]  Wen Hu,et al.  Real-time classification via sparse representation in acoustic sensor networks , 2013, SenSys '13.

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  I. Bratko,et al.  Information-based evaluation criterion for classifier's performance , 2004, Machine Learning.

[19]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[20]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[22]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[23]  Michifumi Yoshioka,et al.  Recommendation from access logs with ensemble learning , 2017, Artificial Life and Robotics.

[24]  Li Xiu,et al.  Application of data mining techniques in customer relationship management: A literature review and classification , 2009, Expert Syst. Appl..

[25]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[26]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[27]  Luís Torgo,et al.  OpenML: A Collaborative Science Platform , 2013, ECML/PKDD.

[28]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[29]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[30]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[31]  David G. Stork,et al.  Pattern Classification , 1973 .

[32]  Robert P. W. Duin,et al.  Approximating the multiclass ROC by pairwise analysis , 2007, Pattern Recognit. Lett..

[33]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[34]  Zijian Zheng,et al.  A Benchmark For Classifier Learning , 1993 .

[35]  Loris Nanni,et al.  Toward a General-Purpose Heterogeneous Ensemble for Pattern Classification , 2015, Comput. Intell. Neurosci..

[36]  Hendrik Blockeel,et al.  A new way to share, organize and learn from experiments , 2012 .

[37]  Paolo Giudici,et al.  Applied Data Mining: Statistical Methods for Business and Industry , 2003 .

[38]  W. W. Daniel Applied Nonparametric Statistics , 1979 .

[39]  Xindong Wu,et al.  Error Detection and Impact-Sensitive Instance Ranking in Noisy Datasets , 2004, AAAI.

[40]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[41]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[42]  Yufei Xia,et al.  A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring , 2017, Expert Syst. Appl..

[43]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[44]  Michel Ballings,et al.  Evaluating multiple classifiers for stock price direction prediction , 2015, Expert Syst. Appl..

[45]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[46]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[47]  Loris Nanni,et al.  Matrix representation in pattern classification , 2012, Expert Syst. Appl..

[48]  Luís Torgo,et al.  OpenML: A Collaborative Science Platform , 2013, ECML/PKDD.

[49]  Han Tong Loh,et al.  Comparison of Extreme Learning Machine with Support Vector Machine for Text Classification , 2005, IEA/AIE.

[50]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[51]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[53]  O. J. Dunn Multiple Comparisons Using Rank Sums , 1964 .

[54]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[55]  Ying Liu,et al.  A Comparative Study on Feature Selection Methods for Drug Discovery , 2004, J. Chem. Inf. Model..

[56]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[57]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[58]  Peter A. Flach,et al.  A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss C` Esar Ferri , 2012 .

[59]  Núria Macià,et al.  Towards UCI+: A mindful repository design , 2014, Inf. Sci..

[60]  Geoff Holmes,et al.  Experiment databases , 2012, Machine Learning.

[61]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[62]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[63]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[64]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[65]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[66]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[67]  Hsuan-Tien Lin A Study on Sigmoid Kernels for SVM and the Training of non-PSD Kernels by SMO-type Methods , 2005 .

[68]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[69]  Azuraliza Abu Bakar,et al.  A review of feature selection techniques in sentiment analysis , 2019, Intell. Data Anal..

[70]  Cao Feng,et al.  STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS , 1995 .

[71]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[72]  Ludmila I. Kuncheva,et al.  Switching between selection and fusion in combining classifiers: an experiment , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[73]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[74]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[75]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[76]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[77]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[78]  José Manuel Benítez,et al.  Empirical study of feature selection methods based on individual feature evaluation for classification problems , 2011, Expert Syst. Appl..

[79]  Zhaohui Zheng,et al.  Stochastic gradient boosted distributed decision trees , 2009, CIKM.

[80]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[81]  Charles X. Ling,et al.  AUC: A Better Measure than Accuracy in Comparing Learning Algorithms , 2003, Canadian Conference on AI.

[82]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Comparing machine learning classifiers in potential distribution modelling , 2011, Expert Syst. Appl..

[84]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[85]  Ludmila I Kuncheva,et al.  Classifier ensembles for fMRI data analysis: an experiment. , 2010, Magnetic resonance imaging.

[86]  Loris Nanni,et al.  Coupling different methods for overcoming the class imbalance problem , 2015, Neurocomputing.

[87]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[88]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[89]  Taghi M. Khoshgoftaar,et al.  An Empirical Study of Learning from Imbalanced Data Using Random Forest , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[90]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[91]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.