Bayesian Multi-Task Relationship Learning with Link Structure

In this paper, we study the multi-task learning problem with a new perspective of considering the link structure of data and task relationship modeling simultaneously. In particular, we first introduce the Matrix Gaussian (MG) distribution and Matrix Generalized Inverse Gaussian (MGIG) distribution, then define a Matrix Gaussian Matrix Generalized Inverse Gaussian (MG-MGIG) prior. Based on this prior, we propose a novel multi-task learning algorithm, the Bayesian Multi-task Relationship Learning (BMTRL) algorithm. To incorporate the link structure into the framework of BMTRL, we propose link constraints between samples. Through combining the BMTRL algorithm with the link constraints, we propose the Bayesian Multi-task Relationship Learning with Link Constraints (BMTRL-LC) algorithm. Further, we apply the manifold theory to provide an extension of BMTRL-LC to data with no link structure. Specifically, BMTRL-LC is effective for multi-task learning with only limited training samples, which is not addressed in the existing literature. To make the computation tractable, we simultaneously use a convex optimization method and sampling techniques. In particular, we adopt two stochastic EM algorithms for BMTRL and BMTRL-LC, respectively. The experimental results on three real datasets demonstrate the promise of the proposed algorithms.

[1]  Yiming Yang,et al.  Learning Multiple Related Tasks using Latent Independent Component Analysis , 2005, NIPS.

[2]  Onno Zoeter,et al.  Sparse Bayesian Multi-Task Learning , 2011, NIPS.

[3]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[6]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[7]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[8]  Volker Tresp,et al.  Robust multi-task learning with t-processes , 2007, ICML '07.

[9]  Ramesh Nallapati,et al.  Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs , 2021, ICWSM.

[10]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[11]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[12]  Hal Daumé,et al.  Infinite Predictor Subspace Models for Multitask Learning , 2010, AISTATS.

[13]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[16]  Jieping Ye,et al.  A convex formulation for learning shared structures from multiple tasks , 2009, ICML '09.

[17]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[18]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[21]  Yihong Gong,et al.  Combining content and link for classification using matrix factorization , 2007, SIGIR.

[22]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[23]  Fernando José Von Zuben,et al.  Multi-task Sparse Structure Learning , 2014, CIKM.

[24]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[25]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[26]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[27]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[28]  Jeff G. Schneider,et al.  Learning Multiple Tasks with a Sparse Matrix-Normal Penalty , 2010, NIPS.

[29]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[30]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[31]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[32]  Ronald W. Butler,et al.  Laplace approximation for Bessel functions of matrix argument , 2003 .

[33]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[34]  Johan A. K. Suykens,et al.  Benchmarking Least Squares Support Vector Machine Classifiers , 2004, Machine Learning.

[35]  Michael R. Lyu,et al.  SoRec: social recommendation using probabilistic matrix factorization , 2008, CIKM '08.

[36]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[37]  Samuel Kotz,et al.  The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance , 2001 .

[38]  Jieping Ye,et al.  Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks , 2010, TKDD.

[39]  C. Herz BESSEL FUNCTIONS OF MATRIX ARGUMENT , 1955 .

[40]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[41]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[42]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[43]  Hal Daumé,et al.  Learning Multiple Tasks using Manifold Regularization , 2010, NIPS.

[44]  Jiawei Han,et al.  Modeling hidden topics on document manifold , 2008, CIKM '08.

[45]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[46]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[47]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[48]  Francis R. Bach,et al.  A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization , 2008, J. Mach. Learn. Res..

[49]  Ronald W. Butler Generalized Inverse Gaussian Distributions and their Wishart Connections , 1998 .

[50]  James V. Zidek,et al.  Statistical Analysis of Environmental Space-Time Processes , 2006 .

[51]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[52]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[53]  Ming Yang,et al.  Multi-Task Learning with Gaussian Matrix Generalized Inverse Gaussian Model , 2013, ICML.

[54]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[55]  Hongliang Fei,et al.  Structured Feature Selection and Task Relationship Inference for Multi-task Learning , 2011, ICDM.