Empirical Asymmetric Selective Transfer in Multi-objective Decision Trees

We consider learning tasks where multiple target variables need to be predicted. Two approaches have been used in this setting: (a) build a separate single-target model for each target variable, and (b) build a multi-target model that predicts all targets simultaneously; the latter may exploit potential dependencies among the targets. For a given target, either (a) or (b) can yield the most accurate model. This shows that exploiting information available in other targets may be beneficial as well as detrimental to accuracy. This raises the question whether it is possible to find, for a given target (we call this the main target), the best subset of the other targets (the support targets) that, when combined with the main target in a multi-target model, results in the most accurate model for the main target. We propose Empirical Asymmetric Selective Transfer (EAST), a generally applicable algorithm that approximates such a subset. Applied to decision trees, EAST outperforms single-target decision trees, multi-target decision trees, and multi-target decision trees with target clustering.

[1]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[2]  Jennifer Neville,et al.  Why collective inference improves relational classification , 2004, KDD.

[3]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[4]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[5]  Saso Dzeroski,et al.  Constraint Based Induction of Multi-objective Regression Trees , 2005, KDID.

[6]  Thomas G. Dietterich,et al.  To transfer or not to transfer , 2005, NIPS 2005.

[7]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[8]  Samuel Kaski,et al.  Learning from Relevant Tasks Only , 2007, ECML.

[9]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[10]  Daniel L. Silver,et al.  The Parallel Transfer of Task Knowledge Using Dynamic Learning Rates Based on a Measure of Relatedness , 1996, Connect. Sci..

[11]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[12]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[13]  S. Džeroski,et al.  Using multi-objective classification to model communities of soil microarthropods , 2006 .

[14]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[15]  Saso Dzeroski,et al.  Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics , 2006, PKDD.

[16]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[17]  Thomas G. Dietterich,et al.  Transfer Learning with an Ensemble of Background Tasks , 2005, NIPS 2005.

[18]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[19]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[20]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[21]  Saso Dzeroski,et al.  Simultaneous Prediction of Mulriple Chemical Parameters of River Water Quality with TILDE , 1999, PKDD.

[22]  Luís Torgo Error Estimators for Pruning Regression Trees , 1998, ECML.

[23]  Bernard Ženko,et al.  Learning Predictive Clustering Rules , 2005, Informatica.

[24]  D. Silver,et al.  Selective Functional Transfer : Inductive Bias from Related Tasks , 2001 .