Learning Task Relatedness via Dirichlet Process Priors for Linear Regression Models

In this paper we present a hierarchical model of linear regression func- tions in the context of multi-task learning. The parameters of the linear model are coupled by a Dirichlet Process (DP) prior, which implies a clustering of re- lated functions for different tasks. To make approximate Bayesian inference under this model we apply the Bayesian Hierarchical Clustering (BHC) algorithm. The experiments are conducted on two real world problems: (i) school exam score pre- diction and (ii) prediction of ground-motion parameters. In comparison to baseline methods with no shared prior the results show an improved prediction performance when using the hierarchical model. k i=1 , where nk = |Dk| is the cardinality of the k-th data set, xki ∈ R d is th i-th covariate of the k-th task and yki ∈ R is the corresponding target value. Furthermore, D =( X, y) denotes the complete data set with X = {Xk} K=1 and y = {yk} K=1 . The aim in multi-task learning is to learn function estimators f k simultaneously to share some in- formation in an arbitrary way. A common technique in multi-task learning to share information across tasks is Hierarchical Bayesian modelling (1, 5), which makes the assumption that model pa- rameters are drawn from a common prior distribution. By learning these parameters jointly the individual tasks will interact and regulate each other. A drawback of such a prior by reason of its modality is that the relationship between all tasks are treated equally, but it is desirable that only similar tasks share information to permit negative transfer. To deal with these issues we propose a nonparametric hierarchical Bayesian model where the common prior is drawn from a DP. The DP prior induces a partitioning of tasks with an infinite number of components, so that only similar tasks within each cluster share the same parameterization. A similar approach was previously proposed in the context of classification by Roy and Kaelbling (3) using a Naive Bayes classifier, and Xue et al. (4) using logistic regression. Related to Roy and Kaelbling, we apply the BHC (6) algorithm for performing inference in our model. ∗ This work was (partly) funded by the German Research Foundation, grant DFG RI 2037/2-1.