Benchmarking Analysis of the Accuracy of Classification Methods Related to Entropy

In the machine learning literature we can find numerous methods to solve classification problems. We propose two new performance measures to analyze such methods. These measures are defined by using the concept of proportional reduction of classification error with respect to three benchmark classifiers, the random and two intuitive classifiers which are based on how a non-expert person could realize classification simply by applying a frequentist approach. We show that these three simple methods are closely related to different aspects of the entropy of the dataset. Therefore, these measures account somewhat for entropy in the dataset when evaluating the performance of classifiers. This allows us to measure the improvement in the classification results compared to simple methods, and at the same time how entropy affects classification capacity. To illustrate how these new performance measures can be used to analyze classifiers taking into account the entropy of the dataset, we carry out an intensive experiment in which we use the well-known J48 algorithm, and a UCI repository dataset on which we have previously selected a subset of the most relevant attributes. Then we carry out an extensive experiment in which we consider four heuristic classifiers, and 11 datasets.

[1]  Steven Skiena,et al.  The Data Science Design Manual , 2017, Texts in Computer Science.

[2]  Qinfeng Shi,et al.  Sensor enabled wearable RFID technology for mitigating the risk of falls near beds , 2013, 2013 IEEE International Conference on RFID (RFID).

[3]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[4]  Robert A Sottilare,et al.  Conducting an Analysis of a Qualitative Dataset Using the Waikato Environment for Knowledge Analysis (WEKA) , 2015 .

[5]  A. Rabasa,et al.  A Computational Experience For Automatic Feature Selection On Big Data Frameworks , 2016 .

[6]  Lili Bai,et al.  Research on feature selection for rotating machinery based on Supervision Kernel Entropy Component Analysis with Whale Optimization Algorithm , 2020, Appl. Soft Comput..

[7]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[9]  Randall D. Beer,et al.  Nonnegative Decomposition of Multivariate Information , 2010, ArXiv.

[10]  Hocine Cherifi,et al.  Evaluation of Performance Measures for Classifiers Comparison , 2011, UbiComp 2011.

[11]  Alex A. Freitas,et al.  A review of performance evaluation measures for hierarchical classifiers , 2007 .

[12]  Jiye Liang,et al.  An Ensemble Classification Algorithm Based on Information Entropy for Data Streams , 2017, Neural Processing Letters.

[13]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[14]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognit. Lett..

[15]  Francisco J. Valverde-Albacete,et al.  Two information-theoretic tools to assess the performance of multi-class classifiers , 2010, Pattern Recognit. Lett..

[16]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[17]  K. Fu,et al.  An optimum finite sequential procedure for feature selection and pattern classification , 1967, IEEE Transactions on Automatic Control.

[18]  Peter C. Jurs,et al.  Mass spectral feature selection and structural correlations using computerized learning machines , 1970 .

[19]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[20]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[21]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[22]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[23]  Hongyuan Zha,et al.  Entropy-based fuzzy support vector machine for imbalanced datasets , 2017, Knowl. Based Syst..

[24]  Marcin Szpyrka,et al.  An Entropy-Based Network Anomaly Detection Method , 2015, Entropy.

[25]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[26]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[27]  Charles Parker,et al.  An Analysis of Performance Measures for Binary Classifiers , 2011, 2011 IEEE 11th International Conference on Data Mining.

[28]  Nida Meddouri,et al.  Parallel Learning and Classification for Rules based on Formal Concepts , 2014, KES.

[29]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[30]  Sergio Hernández,et al.  A Brief Review of Generalized Entropies , 2018, Entropy.

[31]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[33]  Zhen Liu,et al.  Accelerating information entropy-based feature selection using rough set theory with classified nested equivalence classes , 2020, Pattern Recognit..

[34]  Francisco J. Valverde-Albacete,et al.  100% Classification Accuracy Considered Harmful: The Normalized Information Transfer Factor Explains the Accuracy Paradox , 2014, PloS one.

[35]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[36]  Mohammad Shorif Uddin,et al.  Multiclass EEG signal classification utilizing Rényi min-entropy-based feature selection from wavelet packet transformation , 2020, Brain Informatics.

[37]  Pietro Perona,et al.  Entropy-based active learning for object recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[38]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[39]  Shie Mannor,et al.  The cross entropy method for classification , 2005, ICML.

[40]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[41]  Francisco J. Valverde-Albacete,et al.  A Framework for Supervised Classification Performance Analysis with Information-Theoretic Methods , 2020, IEEE Transactions on Knowledge and Data Engineering.

[42]  Charu C. Aggarwal,et al.  Data Mining: The Textbook , 2015 .

[43]  Marcel Abendroth,et al.  Data Mining Practical Machine Learning Tools And Techniques With Java Implementations , 2016 .

[44]  Abolfazl Razi,et al.  Game Theoretic Approach for Systematic Feature Selection; Application in False Alarm Detection in Intensive Care Units , 2018, Entropy.

[45]  Eytan Ruppin,et al.  Feature Selection via Coalitional Game Theory , 2007, Neural Computation.

[46]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[47]  Rong Li,et al.  Non-unique decision differential entropy-based feature selection , 2020, Neurocomputing.

[48]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[49]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[50]  S. Staibano,et al.  Prediction of Tumor Grade and Nodal Status in Oropharyngeal and Oral Cavity Squamous-cell Carcinoma Using a Radiomic Approach , 2019, AntiCancer Research.

[51]  J. R. Quinlan Induction of decision trees , 2004, Machine Learning.

[52]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[53]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[54]  Mohamed Alloghani,et al.  Implementation of machine learning algorithms to create diabetic patient re-admission profiles , 2019, BMC Medical Informatics and Decision Making.

[55]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[56]  Paulo Cortez,et al.  A data-driven approach to predict the success of bank telemarketing , 2014, Decis. Support Syst..

[57]  Joshua D. Knowles,et al.  Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach , 2016, Monthly Notices of the Royal Astronomical Society.

[58]  Claudio De Stefano,et al.  Reliable writer identification in medieval manuscripts through page layout features: The "Avila" Bible case , 2018, Eng. Appl. Artif. Intell..

[59]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[60]  Amit Kumar Yadav,et al.  Solar energy potential assessment of western Himalayan Indian state of Himachal Pradesh using J48 algorithm of WEKA in ANN based prediction model , 2015 .

[61]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[62]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[63]  Yasen Jiao,et al.  Performance measures in evaluating machine learning based bioinformatics predictors for classifications , 2016, Quantitative Biology.

[64]  B. S. Harish,et al.  A New Feature Selection Method based on Intuitionistic Fuzzy Entropy to Categorize Text Documents , 2018, Int. J. Interact. Multim. Artif. Intell..

[65]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[66]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[67]  Kagan Tumer,et al.  Estimating the Bayes error rate through classifier combining , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[68]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[69]  Chih-Ming Chen,et al.  An efficient fuzzy classifier with feature selection based on fuzzy entropy , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[70]  Thomas L. Isenhour,et al.  Computerized learning machines applied to chemical problems. Convergence rate and predictive ability of adaptive binary pattern classifiers , 1969 .

[71]  Daniel Ramos,et al.  Deconstructing Cross-Entropy for Probabilistic Binary Classifiers , 2018, Entropy.

[72]  S. García,et al.  Online entropy-based discretization for data streaming classification , 2018, Future generations computer systems.

[73]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[74]  T. Wieczorek,et al.  Comparison of feature ranking methods based on information entropy , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[75]  Divergence and linear classifiers for feature selection , 1967, IEEE Transactions on Automatic Control.

[76]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[77]  Y. Chien Adaptive strategies of selecting feature subsets in pattern recognition , 1969 .

[78]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[79]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Naonori Ueda,et al.  Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[82]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[83]  Nida Meddouri,et al.  A New Feature Selection Method for Nominal Classifier based on Formal Concept Analysis , 2017, KES.

[84]  G. Crooks On Measures of Entropy and Information , 2015 .

[85]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[86]  Francisco J. Valverde-Albacete,et al.  The evaluation of data sources using multivariate entropy tools , 2017, Expert Syst. Appl..

[87]  José Hernández-Orallo,et al.  An experimental comparison of performance measures for classification , 2009, Pattern Recognit. Lett..

[88]  Oluseun Omotola Aremu,et al.  A relative entropy based feature selection framework for asset data in predictive maintenance , 2020, Comput. Ind. Eng..