Hierarchical Multi-Task Learning: a Cascade Approach Based on the Notion of Task Relatedness

Multi-task learning can be shown to improve the generalization performance of single tasks under certain conditions. Typically, the algorithmic and theoretical analysis of multi-task learning deals with a two-level structure, including a group of tasks and a single task. In many situations, however, it is benecial to consider varying degrees of relatedness among tasks, assuming that some tasks are closely related and should contribute more to the learning process, while other tasks are less related but can still contribute some information to the learning process. The extension of current approaches to the multi-level setting may not be trivial. We propose a general framework for a full hierarchical multi-task setting. We dene an explicit notion of hierarchical tasks relatedness, where at each level we assume that some aspects of the learning problem are shared. We suggest a cascade approach, where at each level of the hierarchy a learner learns jointly the uniquely shared aspects of the tasks by nding a single shared hypothesis. This shared hypothesis is used to bootstrap the preceding level in the hierarchy, forming a hypothesis search space. We analyze sufcient conditions for our approach to reach optimality, and provide generalization guarantees in an empirical risk minimization setting.

[1]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[2]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[3]  Mario Marchand,et al.  Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[6]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[7]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8]  John Shawe-Taylor,et al.  The Set Covering Machine , 2003, J. Mach. Learn. Res..

[9]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[10]  Yoram Singer,et al.  Online multiclass learning by interclass hypothesis sharing , 2006, ICML.

[11]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[12]  Shai Ben-David,et al.  A notion of task relatedness yielding provable multiple-task learning guarantees , 2008, Machine Learning.

[13]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[14]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[15]  Hal Daumé,et al.  Bayesian Multitask Learning with Latent Hierarchies , 2009, UAI.

[16]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[17]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[18]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.