A Convex Formulation for Learning a Shared Predictive Structure from Multiple Tasks

In this paper, we consider the problem of learning from multiple related tasks for improved generalization performance by extracting their shared structures. The alternating structure optimization (ASO) algorithm, which couples all tasks using a shared feature representation, has been successfully applied in various multitask learning problems. However, ASO is nonconvex and the alternating algorithm only finds a local solution. We first present an improved ASO formulation (iASO) for multitask learning based on a new regularizer. We then convert iASO, a nonconvex formulation, into a relaxed convex one (rASO). Interestingly, our theoretical analysis reveals that rASO finds a globally optimal solution to its nonconvex counterpart iASO under certain conditions. rASO can be equivalently reformulated as a semidefinite program (SDP), which is, however, not scalable to large datasets. We propose to employ the block coordinate descent (BCD) method and the accelerated projected gradient (APG) algorithm separately to find the globally optimal solution to rASO; we also develop efficient algorithms for solving the key subproblems involved in BCD and APG. The experiments on the Yahoo webpages datasets and the Drosophila gene expression pattern images datasets demonstrate the effectiveness and efficiency of the proposed algorithms and confirm our theoretical analysis.

[1]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[2]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[3]  Rie Kubota Ando,et al.  BioCreative II Gene Mention Tagging System at IBM Watson , 2007 .

[4]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[5]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[6]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[7]  Thomas Serre,et al.  Categorization by Learning and Combining Object Parts , 2001, NIPS.

[8]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[9]  Jieping Ye,et al.  Drosophila gene expression pattern annotation using sparse features and term-term interactions , 2009, KDD.

[10]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[11]  P. Tomançak,et al.  Global Analysis of mRNA Localization Reveals a Prominent Role in Organizing Cellular Architecture and Function , 2007, Cell.

[12]  Paul Tseng,et al.  Trace Norm Regularization: Reformulations, Algorithms, and Multi-Task Learning , 2010, SIAM J. Optim..

[13]  Arkadi Nemirovski,et al.  EFFICIENT METHODS IN CONVEX PROGRAMMING , 2007 .

[14]  Murat Dundar,et al.  An Improved Multi-task Learning Approach with Applications in Medical Diagnosis , 2008, ECML/PKDD.

[15]  Wen Gao,et al.  Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video , 2010, International Journal of Computer Vision.

[16]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[17]  David D. Lewis,et al.  Evaluating Text Categorization I , 1991, HLT.

[18]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[19]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[20]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[21]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[22]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[23]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[24]  Trevor Darrell,et al.  Learning Visual Representations using Images with Captions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Charless C. Fowlkes,et al.  A Quantitative Spatiotemporal Atlas of Gene Expression in the Drosophila Blastoderm , 2008, Cell.

[26]  Anton Schwaighofer,et al.  Learning Gaussian Process Kernels via Hierarchical Bayes , 2004, NIPS.

[27]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[28]  P. Brucker Review of recent development: An O( n) algorithm for quadratic knapsack problems , 1984 .

[29]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[30]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[31]  H. Theron,et al.  BEXA: A Covering Algorithm for Learning Propositional Concept Descriptions , 1996, Machine Learning.

[32]  Yiming Yang,et al.  Learning Multiple Related Tasks using Latent Independent Component Analysis , 2005, NIPS.

[33]  Ya Zhang,et al.  Boosted multi-task learning , 2010, Machine Learning.

[34]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[35]  Jieping Ye,et al.  Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks , 2010, TKDD.

[36]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[37]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[38]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[39]  Michael L. Overton,et al.  Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices , 2015, Math. Program..

[40]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[41]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[42]  Tony Jebara,et al.  Multi-task feature and kernel selection for SVMs , 2004, ICML.

[43]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[44]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[45]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[46]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[47]  Julien Mairal,et al.  Convex optimization with sparsity-inducing norms , 2011 .

[48]  Volker Tresp,et al.  Robust multi-task learning with t-processes , 2007, ICML '07.

[49]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[50]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[51]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..