Simultaneous Threshold Interaction Detection in Binary Classification

Classification Trunk Approach (CTA) is a method for the automatic selection of threshold interactions in generalized linear modelling (GLM). It comes out from the integration of classification trees and GLM. Interactions between predictors are expressed as “threshold interactions” instead of traditional cross-products. Unlike classification trees, CTA is based on a different splitting criterion and it is framed in a new algorithm – STIMA – that can be used to estimate threshold interactions effects in classification and regression models. This paper specifically focuses on the binary response case, and presents the results of an application on the Liver Disorders dataset to give insight into the advantages deriving from the use of CTA with respect to other model-based or decision tree-based approaches. Performances of the different methods are compared focusing on prediction accuracy and model complexity.

[1]  J. Friedman Multivariate adaptive regression splines , 1990 .

[2]  Jacqueline J. Meulman,et al.  The regression trunk approach to discover treatment covariate interaction , 2004 .

[3]  Claudio Conversano,et al.  Combining an Additive and Tree-Based Regression Model Simultaneously: STIMA , 2010 .

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  A. V. van Balkom,et al.  Which Panic Disorder Patients Benefit from Which Treatment: Cognitive Therapy or Antidepressants? , 2007, Psychotherapy and Psychosomatics.

[8]  L. Fahrmeir,et al.  Multivariate statistical modelling based on generalized linear models , 1994 .

[9]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[10]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[11]  D. R. Cox,et al.  Interpretation of interaction: A review , 2007, 0712.1106.

[12]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[14]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[15]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[16]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .