Inducing non-orthogonal and non-linear decision boundaries in decision trees via interactive basis functions

Abstract Decision Trees (DTs) are a machine learning technique widely used for regression and classification purposes. Conventionally, the decision boundaries of Decision Trees are orthogonal to the features under consideration. A well-known limitation of this is that the algorithm may fail to find optimal partitions, or in some cases any partitions at all, depending on the underlying distribution of the data. To remedy this limitation, several modifications have been proposed that allow for oblique decision boundaries. The objective of this paper is to propose a new strategy for generating flexible decision boundaries by means of interactive basis functions (IBFs). We show how oblique decision boundaries can be obtained as a particular case of IBFs, and in addition how non-linear decision boundaries can be induced. One attractive aspect of the strategy proposed in this paper is that training Decision Trees with IBFs does not require custom software, since the functions can be precalculated for use in any existing implementation of the algorithm. Since the underlying mechanisms remain unchanged there is no substantial computational overhead compared to conventional trees. Furthermore, this also means that IBFs can be used in any extensions of the Decision Tree algorithm, such as evolutionary trees, boosting, and bagging. We conduct a benchmarking exercise to understand under which conditions the use of IBFs can improve model the performance. In addition, we present three empirical applications that illustrate the approach in classification and regression. As part of discussing the empirical applications, we introduce a device called decision charts to facilitate the interpretation of DTs with IBFs. Finally, we conclude the paper by outlining some directions for future research.

[1]  Qunying Huang,et al.  "Voting with Their Feet": Delineating the Sphere of Influence Using Social Media Data , 2017, ISPRS Int. J. Geo Inf..

[2]  Seth E. Spielman,et al.  Identifying regions based on flexible user-defined constraints , 2014, Int. J. Geogr. Inf. Sci..

[3]  W. Alonso Location And Land Use , 1964 .

[4]  Lazaros G. Papageorgiou,et al.  A regression tree approach using mathematical programming , 2017, Expert Syst. Appl..

[5]  G. Powell American Voter Turnout in Comparative Perspective , 1986, American Political Science Review.

[6]  Hadley Wickham,et al.  ggmap: Spatial Visualization with ggplot2 , 2013, R J..

[7]  P. Klein,et al.  Identifying and Bounding Ethnic Neighborhoods , 2011, Urban geography.

[8]  Ponnuthurai N. Suganthan,et al.  Random Forests with ensemble of feature spaces , 2014, Pattern Recognit..

[9]  J. Gaudart,et al.  Oblique decision trees for spatial pattern detection: optimal algorithm and application to malaria risk. , 2005 .

[10]  Roland Füss,et al.  The Role of Spatial and Temporal Structure for Residential Rent Predictions , 2015 .

[11]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[12]  Steven C. Bourassa,et al.  Do Housing Submarkets Really Matter , 2003 .

[13]  Chandrika Kamath,et al.  Inducing oblique decision trees with evolutionary algorithms , 2003, IEEE Trans. Evol. Comput..

[14]  Reyer Zwiggelaar,et al.  Tree-based modelling for the classification of mammographic benign and malignant micro-calcification clusters , 2018, Multimedia Tools and Applications.

[15]  Ponnuthurai N. Suganthan,et al.  Oblique random forest ensemble via Least Square Estimation for time series forecasting , 2017, Inf. Sci..

[16]  S. Travis Waller,et al.  Developing a disaggregate travel demand system of models using data mining techniques , 2017 .

[17]  B. Geys Explaining voter turnout: A review of aggregate-level research , 2006 .

[18]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[19]  Ullrich Köthe,et al.  On Oblique Random Forests , 2011, ECML/PKDD.

[20]  R. Jackman Political Institutions and Voter Turnout in the Industrial Democracies , 1987, American Political Science Review.

[21]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[22]  King-Sun Fu,et al.  A Nonparametric Partitioning Procedure for Pattern Classification , 1969, IEEE Transactions on Computers.

[23]  Hamid Darabi,et al.  River suspended sediment modelling using the CART model: A comparative study of machine learning techniques. , 2018, The Science of the total environment.

[24]  Daniel Stockemer What Affects Voter Turnout? A Review Article/Meta-Analysis of Aggregate Research , 2016, Government and Opposition.

[25]  Ponnuthurai N. Suganthan,et al.  Oblique Decision Tree Ensemble via Multisurface Proximal Support Vector Machine , 2015, IEEE Transactions on Cybernetics.

[26]  Kazuaki Miyamoto,et al.  Spatial Association and Heterogeneity Issues in Land Price Models , 2001 .

[27]  C. J. Price,et al.  HHCART: An oblique decision tree , 2015, Comput. Stat. Data Anal..

[28]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[29]  Narendra Ahuja,et al.  Robust Visual Tracking Using Oblique Random Forests , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[31]  S. Praskievicz River Classification as a Geographic Tool in the Age of Big Data and Global Change , 2018 .

[32]  P. N. Suganthan,et al.  Benchmarking Ensemble Classifiers with Novel Co-Trained Kernal Ridge Regression and Random Vector Functional Link Ensembles [Research Frontier] , 2017, IEEE Computational Intelligence Magazine.

[33]  Stuart A. Gabriel,et al.  A Note on Housing Market Segmentation in an Israeli Development Town , 1984 .

[34]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[35]  Yihui Xie,et al.  Dynamic Documents with R and knitr , 2015 .

[36]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[37]  Naresh Manwani,et al.  Geometric Decision Tree , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[38]  Graham J. Williams Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery , 2011 .

[39]  J. Logan,et al.  Mapping America in 1880: The Urban Transition Historical GIS Project , 2011, Historical methods.

[40]  Basak Aldemir Bektas,et al.  Using Classification Trees for Predicting National Bridge Inventory Condition Ratings , 2013 .

[41]  Achim Zeileis,et al.  evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R , 2014 .

[42]  Antonio Páez,et al.  A Bayesian approach to hedonic price analysis , 2014 .

[43]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[44]  Mevlut Ture,et al.  Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease , 2008, Expert Syst. Appl..

[45]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[46]  C. J. Price,et al.  CARTopt: a random search method for nonsmooth unconstrained optimization , 2013, Comput. Optim. Appl..

[47]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[48]  Julian Hagenauer,et al.  Data-Driven Regionalization of Housing Markets , 2013 .

[49]  Nadine Dessay,et al.  SPODT: An R Package to Perform Spatial Partitioning , 2015 .