Logistic Model Trees

Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric quantities, there has been work on combining these two schemes into ‘model trees’, i.e. trees that contain linear regression functions at the leaves. In this paper, we present an algorithm that adapts this idea for classification problems, using logistic regression instead of linear regression. We use a stagewise fitting process to construct the logistic regression models that can select relevant attributes in the data in a natural way, and show how this approach can be used to build the logistic regression models at the leaves by incrementally refining those constructed at higher levels in the tree. We compare the performance of our algorithm to several other state-of-the-art learning schemes on 36 benchmark UCI datasets, and show that it produces accurate and compact classifiers.

[1]  W. Loh,et al.  Generalized regression trees , 1995 .

[2]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[5]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[7]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[8]  Michelangelo Ceci,et al.  Trading-Off Local versus Global Effects of Regression Nodes in Model Trees , 2002, ISMIS.

[9]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[10]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[11]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[13]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[14]  João Gama,et al.  Functional Trees , 2001, Machine Learning.

[15]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[17]  Yong Wang,et al.  Using Model Trees for Classification , 1998, Machine Learning.

[18]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[19]  David Lubinsky Tree Structured Interpretable Regression , 1995, AISTATS.

[20]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[21]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[22]  Eibe Frank,et al.  Logistic Model Trees , 2003, ECML.

[23]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[24]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[25]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[26]  Trevor Hastie,et al.  Additive Logistic Regression : a Statistical , 1998 .

[27]  W. Loh,et al.  LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees , 2004 .