Music rhythm tree based partitioning approach to decision tree classifier

Abstract Decision tree is a widely used non-parametric technique in machine learning, data mining and pattern recognition. It is simple to understand and interpret, however it faces challenges such as handling higher dimensional and class imbalanced datasets, over-fitting and instability. To overcome some of these issues, vertical partitioning approaches like serial partitioning, theme based partitioning are used in the literature. A vertical partitioning approach divides the feature set into subsets of features (blocks) and makes use of these subsets for subsequent tasks. In this work, we use the ideas of music rhythm tree to propose a novel vertical partitioning technique. It orders the features based on the average correlation strength of the features before partitioning the feature set. The proposed method is proved to be superior by showing an average of 13.8 % , 6 % , 9.8 % , 19.7 % , 9.4 % , and 29.4 % higher classification accuracy over C4.5, Random Forest, Bagging, Adaboost, an ensemble technique and a vertical partitioning technique respectively. Our empirical results on 15 datasets demonstrate that the proposed vertical partitioning method is more stable and better in handling class-imbalanced data. Finally, some popular statistical tests are conducted to validate the statistical significance of the results of the proposed method.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Atul Negi,et al.  Decision Tree classifier using theme based partitioning , 2015, 2015 International Conference on Computing and Network Communications (CoCoNet).

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  N. P. Guhan Seshadri,et al.  Music induced emotion using wavelet packet decomposition - An EEG study , 2018, Biomed. Signal Process. Control..

[6]  M. Narasimha Murty,et al.  Classification by majority voting in feature partitions , 2016, Int. J. Inf. Decis. Sci..

[7]  M. Erdem Günay,et al.  Decision tree analysis for efficient CO2 utilization in electrochemical systems , 2018, Journal of CO2 Utilization.

[8]  Yiwen Zhu,et al.  Multi-matrices entropy discriminant ensemble learning for imbalanced problem , 2019, Neural Computing and Applications.

[9]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[10]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[11]  Albrecht Zimmermann,et al.  Ensemble-Trees: Leveraging Ensemble Power Inside Decision Trees , 2008, Discovery Science.

[12]  Raúl Monroy,et al.  Some features speak loud, but together they all speak louder: A study on the correlation between classification error and feature usage in decision-tree classification ensembles , 2018, Eng. Appl. Artif. Intell..

[13]  Hamido Fujita,et al.  Multi-Imbalance: An open-source software for multi-class imbalance learning , 2019, Knowl. Based Syst..

[14]  Zhi Chen,et al.  A synthetic neighborhood generation based ensemble learning for the imbalanced data classification , 2017, Applied Intelligence.

[15]  Qiang Li,et al.  Region compatibility based stability assessment for decision trees , 2018, Expert Syst. Appl..

[16]  Sonajharia Minz,et al.  An Optimal Multi-view Ensemble Learning for High Dimensional Data Classification Using Constrained Particle Swarm Optimization , 2017 .

[17]  Zahra Mirzamomen,et al.  A framework to induce more stable decision trees for pattern classification , 2017, Pattern Analysis and Applications.

[18]  W. T. Fitch,et al.  The Evolution of Rhythm Processing , 2018, Trends in Cognitive Sciences.

[19]  Qing He,et al.  Real-value negative selection over-sampling for imbalanced data set learning , 2019, Expert Syst. Appl..

[20]  Mark Last,et al.  Improving Stability of Decision Trees , 2002, Int. J. Pattern Recognit. Artif. Intell..

[21]  J. Zurada,et al.  Identification of Full and Partial Class Relevant Genes , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Cagatay Catal,et al.  On the use of ensemble of classifiers for accelerometer-based activity recognition , 2015, Appl. Soft Comput..

[23]  Moti Zwilling,et al.  Student data mining solution-knowledge management system related to higher education institutions , 2014, Expert Syst. Appl..

[24]  Yan Li,et al.  Forecasting copper prices by decision tree learning , 2017 .

[25]  Lior Rokach,et al.  Data Mining for Improving the Quality of Manufacturing: A Feature Set Decomposition Approach , 2006, J. Intell. Manuf..

[26]  Hang Zhang,et al.  Online Active Learning Paired Ensemble for Concept Drift and Class Imbalance , 2018, IEEE Access.

[27]  Michal Wozniak,et al.  Vertical and Horizontal Data Partitioning for Classifier Ensemble Learning , 2019, CORES.

[28]  Rik Das,et al.  Partition selection with sparse autoencoders for content based image classification , 2017, Neural Computing and Applications.

[29]  Lior Rokach,et al.  Decomposition methodology for classification tasks: a meta decomposer framework , 2006, Pattern Analysis and Applications.

[30]  José Augusto Baranauskas The number of classes as a source for instability of decision tree algorithms in high dimensional datasets , 2012, Artificial Intelligence Review.

[31]  András Sebö,et al.  Optimal Binary Trees with Order Constraints , 1999, Discret. Appl. Math..

[32]  Georg Boenn The Farey Sequence as a Model for Musical Rhythm and Meter , 2018 .

[33]  Robert Gramling,et al.  Using music[al] knowledge to represent expressions of emotions. , 2015, Patient education and counseling.

[34]  Yu Wang,et al.  An Ensemble Learning Approach for Addressing the Class Imbalance Problem in Twitter Spam Detection , 2016, ACISP.

[35]  Zahra Mirzamomen,et al.  Cross Split Decision Trees for pattern classification , 2015, 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE).

[36]  Shankru Guggari,et al.  Non-sequential partitioning approaches to decision tree classifier , 2018, Future Computing and Informatics Journal.

[37]  Lior Rokach,et al.  Genetic algorithm-based feature set partitioning for classification problems , 2008, Pattern Recognit..

[38]  Vipin Kumar,et al.  Multi-view ensemble learning: an optimal feature set partitioning for high-dimensional data classification , 2015, Knowledge and Information Systems.

[39]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[40]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[41]  Ubiratan Holanda Bezerra,et al.  Power system security assessment for multiple contingencies using multiway decision tree , 2017 .

[42]  Vipin Kumar,et al.  Multi-view Ensemble Learning: A Supervised Feature Set Partitioning for High Dimensional Data Classification , 2015, WCI '15.

[43]  Martin Dostál,et al.  Evolutionary Music Composition , 2013, Handbook of Optimization.