Generalized Block-Diagonal Structure Pursuit: Learning Soft Latent Task Assignment against Negative Transfer

In multi-task learning, a major challenge springs from a notorious issue known as negative transfer, which refers to the phenomenon that sharing the knowledge with dissimilar and hard tasks often results in a worsened performance. To circumvent this issue, we propose a novel multi-task learning method, which simultaneously learns latent task representations and a block-diagonal Latent Task Assignment Matrix (LTAM). Different from most of the previous work, pursuing the Block-Diagonal structure of LTAM (assigning latent tasks to output tasks) alleviates negative transfer via collaboratively grouping latent tasks and output tasks such that inter-group knowledge transfer and sharing is suppressed. This goal is challenging, since 1) our notion of Block-Diagonal Property extends the traditional notion for square matrices where the $i$-th column and the $i$-th column represents the same concept; 2) marginal constraints on rows and columns are also required for avoiding isolated latent/output tasks. Facing such challenges, we propose a novel regularizer by means of an equivalent spectral condition realizing this generalized block-diagonal property. Practically, we provide a relaxation scheme which improves the flexibility of the model. With the objective function given, we then propose an alternating optimization method, which not only tells how negative transfer is alleviated in our method but also reveals an interesting connection between our method and the optimal transport problem. Finally, the method is demonstrated on a simulation dataset, three real-world benchmark datasets and further applied to personalized attribute predictions.

[1]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[2]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[3]  Farid Alizadeh,et al.  Interior Point Methods in Semidefinite Programming with Applications to Combinatorial Optimization , 1995, SIAM J. Optim..

[4]  Zhongfei Zhang,et al.  Partially Shared Multi-task Convolutional Neural Network with Local Constraint for Face Attribute Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Sinno Jialin Pan,et al.  Adaptive Group Sparse Multi-task Learning via Trace Lasso , 2017, IJCAI.

[6]  Koby Crammer,et al.  Learning Multiple Tasks using Shared Hypotheses , 2012, NIPS.

[7]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[8]  Massimiliano Pontil,et al.  The Benefit of Multitask Representation Learning , 2015, J. Mach. Learn. Res..

[9]  Xiaochun Cao,et al.  Duet Robust Deep Subspace Clustering , 2019, ACM Multimedia.

[10]  Qingming Huang,et al.  Split Multiplicative Multi-View Subspace Clustering , 2019, IEEE Transactions on Image Processing.

[11]  Nicu Sebe,et al.  PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Massimiliano Pontil,et al.  Bilevel learning of the Group Lasso structure , 2018, NeurIPS.

[13]  Adriana Kovashka,et al.  WhittleSearch: Image search with relative attribute feedback , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Qingming Huang,et al.  Learning Personalized Attribute Preference via Multi-task AUC Optimization , 2019, AAAI.

[15]  Eunho Yang,et al.  Learning Task Clusters via Sparsity Grouped Multitask Learning , 2017, ECML/PKDD.

[16]  Eunho Yang,et al.  Asymmetric multi-task learning based on task relatedness and loss , 2016, ICML 2016.

[17]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[18]  Eunho Yang,et al.  Deep Asymmetric Multi-task Feature Learning , 2017, ICML.

[19]  Nicolas Courty,et al.  Large Scale Optimal Transport and Mapping Estimation , 2017, ICLR.

[20]  Enhong Chen,et al.  Exploiting Task-Feature Co-Clusters in Multi-Task Learning , 2015, AAAI.

[21]  Xuelong Li,et al.  Calibrated Multi-Task Learning , 2018, KDD.

[22]  Tao Mei,et al.  Subspace Clustering by Block Diagonal Representation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Massimiliano Pontil,et al.  Spectral k-Support Norm Regularization , 2014, NIPS.

[24]  Michael L. Overton,et al.  On the Sum of the Largest Eigenvalues of a Symmetric Matrix , 1992, SIAM J. Matrix Anal. Appl..

[25]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[27]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[28]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[29]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[30]  Lei Han,et al.  Multi-Stage Multi-Task Learning with Reduced Rank , 2016, AAAI.

[31]  Massimiliano Pontil,et al.  Sparse coding for multitask and transfer learning , 2012, ICML.

[32]  Koby Crammer,et al.  Convex Multi-Task Learning by Clustering , 2015, AISTATS.

[33]  Adriana Kovashka,et al.  Discovering Attribute Shades of Meaning with the Crowd , 2014, International Journal of Computer Vision.

[34]  Xiaochun Cao,et al.  From Common to Special: When Multi-Attribute Learning Meets Personalized Opinions , 2018, AAAI.

[35]  Leon Wenliang Zhong,et al.  Convex Multitask Learning with Flexible Task Clusters , 2012, ICML.

[36]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[37]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[38]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Tom Heskes,et al.  Solving a Huge Number of Similar Tasks: A Combination of Multi-Task Learning and a Hierarchical Bayesian Approach , 1998, ICML.

[40]  Zi Yin,et al.  On the Dimensionality of Word Embedding , 2018, NeurIPS.

[41]  Zheng Wang,et al.  Multi-task Representation Learning for Travel Time Estimation , 2018, KDD.

[42]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[43]  Qingming Huang,et al.  When to Learn What: Deep Cognitive Subspace Clustering , 2018, ACM Multimedia.

[44]  Heng Ji,et al.  A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling , 2018, ACL.

[45]  Vivien Seguy,et al.  Smooth and Sparse Optimal Transport , 2017, AISTATS.

[46]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Chi-Hyuck Jun,et al.  Variable Selection and Task Grouping for Multi-Task Learning , 2018, KDD.