Oblique Decision Tree Ensemble via Twin Bounded SVM

Abstract Ensemble methods with “perturb and combine” strategy have shown improved performance in the classification problems. Recently, random forest algorithm was ranked one among 179 classifiers evaluated on 121 UCI datasets. Motivated by this, we propose a new approach for the generation of oblique decision trees. At each non-leaf node, the training data samples are grouped in two categories based on the Bhattachrayya distance with randomly selected feature subset. Then, twin bounded support vector machine (TBSVM) is used to get two clustering hyperplanes such that each hyperplane is closer to data points of one group and as far as possible from the data points of other group. Based on these hyperplanes, each non-leaf node is splitted to generate the decision tree. In this paper, we used different base models like random forest (RaF), rotation forest (RoF), random sub rotation forest (RRoF) to generate the different oblique decision tree forests named as TBRaF, TBRoF and TBRRoF, respectively. In earlier oblique decision trees, like multisurface proximal support vector machine (MPSVM) based oblique decision trees, matrices are semi-positive definite and hence different regularization methods are required. However, no explicit regularization techniques need to be applied to the primal problems as the matrices in the proposed TBRaF, TBRoF and TBRRoF are positive definite. We evaluated the performance of the proposed models (TBRaF, TBRoF and TBRRoF) on 49 datasets taken from the UCI repository and on some real-world biological datasets (not in UCI). The experimental results and statistical tests conducted show that TBRaF and TBRRoF outperform other baseline methods.

[1]  Madan Gopal,et al.  Least squares twin support vector machines for pattern classification , 2009, Expert Syst. Appl..

[2]  Yuan-Hai Shao,et al.  Improvements on Twin Support Vector Machines , 2011, IEEE Transactions on Neural Networks.

[3]  Kurt Hornik,et al.  The Design and Analysis of Benchmark Experiments , 2005 .

[4]  Jun Chen,et al.  Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes , 2004, BMC Bioinformatics.

[5]  Ponnuthurai N. Suganthan,et al.  Oblique Decision Tree Ensemble via Multisurface Proximal Support Vector Machine , 2015, IEEE Transactions on Cybernetics.

[6]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[8]  Manuel Fernández Delgado,et al.  Exhaustive comparison of colour texture features and classification methods to discriminate cells categories in histological images of fish ovary , 2013, Pattern Recognit..

[9]  Ponnuthurai Nagaratnam Suganthan,et al.  Comprehensive evaluation of twin SVM based classifiers on UCI datasets , 2019, Appl. Soft Comput..

[10]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[11]  Xudong Jiang,et al.  Linear Subspace Learning-Based Dimensionality Reduction , 2011, IEEE Signal Processing Magazine.

[12]  Ponnuthurai N. Suganthan,et al.  General twin support vector machine with pinball loss function , 2019, Inf. Sci..

[13]  Fatemeh Alamdar,et al.  Twin Bounded Weighted Relaxed Support Vector Machines , 2019, IEEE Access.

[14]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[15]  Bjoern H. Menze,et al.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data , 2009, BMC Bioinformatics.

[16]  Akin Ozcift,et al.  SVM Feature Selection Based Rotation Forest Ensemble Classifiers to Improve Computer-Aided Diagnosis of Parkinson Disease , 2012, Journal of medical systems.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Yong Shi,et al.  Robust twin support vector machine for pattern classification , 2013, Pattern Recognit..

[19]  Muhammad Tanveer,et al.  Sparse pinball twin support vector machines , 2019, Appl. Soft Comput..

[20]  Chong Jin Ong,et al.  A Feature Selection Method for Multilevel Mental Fatigue EEG Classification , 2007, IEEE Transactions on Biomedical Engineering.

[21]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[22]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[23]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[24]  Ja-Chen Lin,et al.  A new LDA-based face recognition system which can solve the small sample size problem , 1998, Pattern Recognit..

[25]  C. Tappert,et al.  A Genetic Algorithm for Constructing Compact Binary Decision Trees , 2009 .

[26]  Li Zhang,et al.  Decision Tree Support Vector Machine , 2007, Int. J. Artif. Intell. Tools.

[27]  Juan José Rodríguez Diez,et al.  Classifier Ensembles with a Random Linear Oracle , 2007, IEEE Transactions on Knowledge and Data Engineering.

[28]  Muhammad Tanveer,et al.  A robust fuzzy least squares twin support vector machine for class imbalance learning , 2018, Appl. Soft Comput..

[29]  De-Shuang Huang,et al.  Cancer classification using Rotation Forest , 2008, Comput. Biol. Medicine.

[30]  Kyungsook Han,et al.  Sequence-based prediction of protein-protein interactions by means of rotation forest and autocorrelation descriptor. , 2010, Protein and peptide letters.

[31]  James S. Goerss,et al.  Tropical Cyclone Track Forecasts Using an Ensemble of Dynamical Models , 2000 .

[32]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[33]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[34]  Witold Pedrycz,et al.  Genetically optimized fuzzy decision trees , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Olvi L. Mangasarian,et al.  Multisurface proximal support vector machine classification via generalized eigenvalues , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Naresh Manwani,et al.  Geometric Decision Tree , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[37]  Muhammad Tanveer,et al.  Robust energy-based least squares twin support vector machines , 2015, Applied Intelligence.

[38]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[39]  Muhammad Tanveer,et al.  EEG signal classification using universum support vector machine , 2018, Expert Syst. Appl..

[40]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[41]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Yuan-Hai Shao,et al.  Probabilistic outputs for twin support vector machines , 2012, Knowl. Based Syst..

[44]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[45]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[46]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[47]  Chun-Xia Zhang,et al.  RotBoost: A technique for combining Rotation Forest and AdaBoost , 2008, Pattern Recognit. Lett..

[48]  Suresh Chandra,et al.  Large-Scale Twin Parametric Support Vector Machine Using Pinball Loss Function , 2021, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[49]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[50]  Muhammad Tanveer Robust and Sparse Linear Programming Twin Support Vector Machines , 2014, Cognitive Computation.

[51]  Marco Wiering,et al.  Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[52]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007 .

[53]  Leonardo Franco,et al.  Multiclass Pattern Recognition Extension for the New C-Mantec Constructive Neural Network Algorithm , 2010, Cognitive Computation.

[54]  Ponnuthurai N. Suganthan,et al.  Ensemble Classification and Regression-Recent Developments, Applications and Future Directions [Review Article] , 2016, IEEE Computational Intelligence Magazine.

[55]  Rui Guo,et al.  A Twin Multi-Class Classification Support Vector Machine , 2012, Cognitive Computation.

[56]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[57]  Muhammad Tanveer,et al.  Newton method for implicit Lagrangian twin support vector machines , 2015, Int. J. Mach. Learn. Cybern..

[58]  Chunhua Zhang,et al.  The new interpretation of support vector machines on statistical learning theory , 2010 .

[59]  Peter Kokol,et al.  Effectiveness of Rotation Forest in Meta-learning Based Gene Expression Classification , 2007, Twentieth IEEE International Symposium on Computer-Based Medical Systems (CBMS'07).

[60]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[61]  Jian Yang,et al.  Recursive projection twin support vector machine via within-class variance minimization , 2011, Pattern Recognit..