Task-group Relatedness and Generalization Bounds for Regularized Multi-task Learning

In this paper, we study the generalization performance of regularized multi-task learning (RMTL) in a vector-valued framework, where MTL is considered as a learning process for vector-valued functions. We are mainly concerned with two theoretical questions: 1) under what conditions does RMTL perform better with a smaller task sample size than STL? 2) under what conditions is RMTL generalizable and can guarantee the consistency of each task during simultaneous learning? In particular, we investigate two types of task-group relatedness: the observed discrepancy-dependence measure (ODDM) and the empirical discrepancy-dependence measure (EDDM), both of which detect the dependence between two groups of multiple related tasks (MRTs). We then introduce the Cartesian product-based uniform entropy number (CPUEN) to measure the complexities of vector-valued function classes. By applying the specific deviation and the symmetrization inequalities to the vector-valued framework, we obtain the generalization bound for RMTL, which is the upper bound of the joint probability of the event that there is at least one task with a large empirical discrepancy between the expected and empirical risks. Finally, we present a sufficient condition to guarantee the consistency of each task in the simultaneous learning process, and we discuss how task relatedness affects the generalization performance of RMTL. Our theoretical findings answer the aforementioned two questions.

[1]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[2]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[3]  Andreas Maurer,et al.  Bounds for Linear Multi-Task Learning , 2006, J. Mach. Learn. Res..

[4]  M. Mohri,et al.  Stability Bounds for Stationary φ-mixing and β-mixing Processes , 2010 .

[5]  John C. Duchi,et al.  The Generalization Ability of Online Algorithms for Dependent Data , 2011, IEEE Transactions on Information Theory.

[6]  Xinjia Chen Concentration Inequalities for Bounded Random Vectors , 2013, ArXiv.

[7]  Rong Jin,et al.  Improved Bounds for the Nyström Method With Application to Kernel Classification , 2011, IEEE Transactions on Information Theory.

[8]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[9]  Massimiliano Pontil,et al.  Sparse coding for multitask and transfer learning , 2012, ICML.

[10]  Claudio Gentile,et al.  Improved Risk Tail Bounds for On-Line Algorithms , 2005, IEEE Transactions on Information Theory.

[11]  John Shawe-Taylor,et al.  Design and Generalization Analysis of Orthogonal Matching Pursuit Algorithms , 2011, IEEE Transactions on Information Theory.

[12]  Shahar Mendelson,et al.  Lower Bounds for the Empirical Minimization Algorithm , 2008, IEEE Transactions on Information Theory.

[13]  Charles A. Micchelli,et al.  Kernels for Multi--task Learning , 2004, NIPS.

[14]  T. Ben-David,et al.  Exploiting Task Relatedness for Multiple , 2003 .

[15]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[16]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[17]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[18]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[19]  Shai Ben-David,et al.  A notion of task relatedness yielding provable multiple-task learning guarantees , 2008, Machine Learning.

[20]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[21]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[22]  R. C. Bradley Basic properties of strong mixing conditions. A survey and some open questions , 2005, math/0511078.

[23]  Ding-Xuan Zhou,et al.  Capacity of reproducing kernel spaces in learning theory , 2003, IEEE Transactions on Information Theory.