Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction

As an effective learning paradigm against insufficient training samples, Multi-task learning (MTL) encourages knowledge sharing across multiple related tasks so as to improve the overall performance. In MTL, a major challenge springs from the phenomenon that sharing the knowledge with dissimilar and hard tasks, known as negative transfer, often results in a worsened performance. Though a substantial amount of studies have been carried out against the negative transfer, most of the existing methods only model the transfer relationship at the task-level, with the transfer across features and tasks left unconsidered. Different from the existing methods, our goal is to alleviate negative transfer collaboratively at both the task- and feature-level. Specifically, we propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL), leveraging the collaborative grouping of features and tasks while suppressing inter-group knowledge sharing. We then propose an optimization method for TFCL with a global convergence guarantee. Moreover, we show that the intermediate procedures of the method not only reveal how the transfer across features and tasks takes place, but also shed some light on the task-feature grouping effect. As a practical extension, we extend TFCL to the personalized attribute prediction problem with fine-grained modeling of user behaviors.

[1]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[2]  Yuan Yao,et al.  Parsimonious Deep Learning: A Differential Inclusion Approach with Global Convergence , 2019, ArXiv.

[3]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[4]  Wen Gao,et al.  Data-Dependent Sparsity for Subspace Clustering , 2017, UAI.

[5]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Heng Ji,et al.  A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling , 2018, ACL.

[8]  Lei Han,et al.  Multi-Stage Multi-Task Learning with Reduced Rank , 2016, AAAI.

[9]  Shuicheng Yan,et al.  Nonconvex Sparse Spectral Clustering by Alternating Direction Method of Multipliers and Its Convergence Analysis , 2017, AAAI.

[10]  Adriana Kovashka,et al.  WhittleSearch: Image search with relative attribute feedback , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Tom Heskes,et al.  Solving a Huge Number of Similar Tasks: A Combination of Multi-Task Learning and a Hierarchical Bayesian Approach , 1998, ICML.

[12]  Qing Ling,et al.  Multi-Task Learning for Subspace Segmentation , 2015, ICML.

[13]  Eunho Yang,et al.  Learning Task Clusters via Sparsity Grouped Multitask Learning , 2017, ECML/PKDD.

[14]  Yuan Yao,et al.  A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training , 2018, ICLR.

[15]  Shivani Agarwal,et al.  Surrogate regret bounds for bipartite ranking via strongly proper losses , 2012, J. Mach. Learn. Res..

[16]  Lei Han,et al.  Learning Multi-Level Task Groups in Multi-Task Learning , 2015, AAAI.

[17]  Zhi-Hua Zhou,et al.  One-Pass AUC Optimization , 2013, ICML.

[18]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[19]  Qingshan Liu,et al.  Robust Subspace Clustering With Compressed Data , 2018, IEEE Transactions on Image Processing.

[20]  C. Villani Optimal Transport: Old and New , 2008 .

[21]  Benar Fux Svaiter,et al.  Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[22]  Zhongfei Zhang,et al.  Partially Shared Multi-task Convolutional Neural Network with Local Constraint for Face Attribute Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Yuan Yao,et al.  Global Convergence of Block Coordinate Descent in Deep Learning , 2018, ICML.

[24]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Qiang Zhou,et al.  Flexible Clustered Multi-Task Learning by Learning Representative Tasks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Xiaodong Zhang The Laplacian eigenvalues of graphs: a survey , 2011, 1111.2897.

[27]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[28]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[30]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[31]  Tao Mei,et al.  Subspace Clustering by Block Diagonal Representation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  David B. Dunson,et al.  Multi-task learning for sequential data via iHMMs and the nested Dirichlet process , 2007, ICML '07.

[34]  René Vidal,et al.  Structured Sparse Subspace Clustering: A unified optimization framework , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[36]  Zheng Wang,et al.  Multi-task Representation Learning for Travel Time Estimation , 2018, KDD.

[37]  Zi Yin,et al.  On the Dimensionality of Word Embedding , 2018, NeurIPS.

[38]  Chi-Hyuck Jun,et al.  Variable Selection and Task Grouping for Multi-Task Learning , 2018, KDD.

[39]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[40]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[41]  Francis R. Bach,et al.  Trace Lasso: a trace norm regularization for correlated designs , 2011, NIPS.

[42]  Sinno Jialin Pan,et al.  Adaptive Group Sparse Multi-task Learning via Trace Lasso , 2017, IJCAI.

[43]  Feiping Nie,et al.  Clustering and projected clustering with adaptive neighbors , 2014, KDD.

[44]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[45]  Daniel P. Robinson,et al.  Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Nicu Sebe,et al.  PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Adrian S. Lewis,et al.  Clarke Subgradients of Stratifiable Functions , 2006, SIAM J. Optim..

[48]  Roberto Andreani,et al.  Optimality conditions and global convergence for nonlinear semidefinite programming , 2018, Mathematical Programming.

[49]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[50]  René Vidal,et al.  A closed form solution to robust subspace estimation and clustering , 2011, CVPR 2011.

[51]  Leon Wenliang Zhong,et al.  Convex Multitask Learning with Flexible Task Clusters , 2012, ICML.

[52]  Eunho Yang,et al.  Asymmetric multi-task learning based on task relatedness and loss , 2016, ICML 2016.

[53]  Marie-Françoise Roy,et al.  Real algebraic geometry , 1992 .

[54]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[55]  Xuelong Li,et al.  Calibrated Multi-Task Learning , 2018, KDD.

[56]  Raman Arora,et al.  Stochastic PCA with 𝓁2 and 𝓁1 Regularization , 2018, ICML.

[57]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[58]  David B. Dunson,et al.  Multi-task compressive sensing with Dirichlet process priors , 2008, ICML '08.

[59]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[60]  Huanhuan Chen,et al.  Robust Task Grouping with Representative Tasks for Clustered Multi-Task Learning , 2019, KDD.

[61]  Ehsan Elhamifar,et al.  Sparse subspace clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  L. Dries,et al.  Geometric categories and o-minimal structures , 1996 .

[63]  Massimiliano Pontil,et al.  Sparse coding for multitask and transfer learning , 2012, ICML.

[64]  Adriana Kovashka,et al.  Attribute Adaptation for Personalized Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[65]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[66]  Guangcan Liu,et al.  Implicit Block Diagonal Low-Rank Representation , 2018, IEEE Transactions on Image Processing.

[67]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations I. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[68]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations: II. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[69]  Adriana Kovashka,et al.  Discovering Attribute Shades of Meaning with the Crowd , 2014, International Journal of Computer Vision.

[70]  Massimiliano Pontil,et al.  New Perspectives on k-Support and Cluster Norms , 2014, J. Mach. Learn. Res..

[71]  Enhong Chen,et al.  Exploiting Task-Feature Co-Clusters in Multi-Task Learning , 2015, AAAI.

[72]  Shuicheng Yan,et al.  Efficient Subspace Segmentation via Quadratic Programming , 2011, AAAI.

[73]  Feiping Nie,et al.  The Constrained Laplacian Rank Algorithm for Graph-Based Clustering , 2016, AAAI.

[74]  Yue Dong,et al.  Subspace Clustering with a Twist , 2016, UAI.

[75]  Xiaochun Cao,et al.  From Common to Special: When Multi-Attribute Learning Meets Personalized Opinions , 2018, AAAI.

[76]  Qingming Huang,et al.  Split Multiplicative Multi-View Subspace Clustering , 2019, IEEE Transactions on Image Processing.

[77]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[78]  Fernando José Von Zuben,et al.  Group LASSO with Asymmetric Structure Estimation for Multi-Task Learning , 2019, IJCAI.