Sparse alternating decision tree

Alternating decision tree (ADTree) brings interpretability to boosting.A novel sparse version of multivariate ADTree is presented.Sparse ADTree is a better generalization of existing univariate ADTree.The decision nodes are designed based on modified sparse discriminant analysis.The complexity of the decision nodes can be regularized easily. Alternating decision tree (ADTree) is a special decision tree representation that brings interpretability to boosting, a well-established ensemble algorithm. This has found success in wide applications. However, existing variants of ADTree are implementing univariate decision nodes where potential interactions between features are ignored. To date, there has been no multivariate ADTree. We propose a sparse version of multivariate ADTree such that it remains comprehensible. The proposed sparse ADTree is empirically tested on UCI datasets as well as spectral datasets from the University of Eastern Finland (UEF). We show that sparse ADTree is competitive against both univariate decision trees (original ADTree, C4.5, and CART) and multivariate decision trees (Fisher's decision tree and a single multivariate decision tree from oblique Random Forest). It achieves the best average rank in terms of prediction accuracy, second in terms of decision tree size and faster induction time than existing ADTree. In addition, it performs especially well on datasets with correlated features such as UEF spectral datasets. Thus, the proposed sparse ADTree extends the applicability of ADTree to a wider variety of applications.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[3]  Qing Mai,et al.  A review of discriminant analysis in high dimensions , 2013 .

[4]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[5]  Jesús Fernández-Villaverde,et al.  A Comparison of Programming Languages in Economics , 2014 .

[6]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Decision-Tree Induction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[10]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[11]  Celia MT Greenwood,et al.  A genome scan for parent-of-origin linkage effects in alcoholism , 2005, BMC Genetics.

[12]  Carl D Langefeld,et al.  Bootstrap Aggregating of Alternating Decision Trees to Detect Sets of SNPs That Associate With Disease , 2012, Genetic epidemiology.

[13]  Peter Kokol,et al.  Gene set enrichment meta-learning analysis: next- generation sequencing versus microarrays , 2009, BMC Bioinformatics.

[14]  Yoav Freund,et al.  Learning a Board Balanced Scorecard to Improve Corporate Performance , 2010, Decis. Support Syst..

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[16]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[17]  Rémi Gilleron,et al.  Learning Multi-label Alternating Decision Trees from Texts and Data , 2003, MLDM.

[18]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[19]  Ethem Alpaydin,et al.  Linear Discriminant Trees , 2000, ICML.

[20]  Geoff Holmes,et al.  Multiclass Alternating Decision Trees , 2002, ECML.

[21]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[22]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[23]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[24]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[25]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[26]  Yoav Freund,et al.  Using Boosting for Financial Analysis and Performance Prediction: Application to S&P 500 Companies, Latin American ADRs and Banks , 2010 .

[27]  Farid García,et al.  Fisher's decision tree , 2013, Expert Syst. Appl..

[28]  Jennifer Lin,et al.  Boosting alternating decision trees modeling of disease trait information , 2005, BMC Genetics.

[29]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[30]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[31]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[32]  Yoav Freund,et al.  Automated trading with boosting and expert weighting , 2010 .

[33]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[34]  Ye Chow Kuang,et al.  Complex feature alternating decision tree , 2010, Int. J. Intell. Syst. Technol. Appl..

[35]  Ullrich Köthe,et al.  On Oblique Random Forests , 2011, ECML/PKDD.