Scalable Multitask Representation Learning for Scene Classification Supplementary Material

1. Implementation Details In this section we discuss certain implementation details of our STL-SDCA and MTL-SDCA solvers. We begin with some notation and then proceed with technical details for each solver. Notation: Let {(xi, yit) : 1 ≤ i ≤ n, 1 ≤ t ≤ T} be the input/output pairs of the multitask learning problem, where xi ∈ R, yit ∈ {±1}, T is the number of tasks, and n is the number of training examples per task. We assume that all tasks have the same training examples even though this can be easily generalized. The standard single task learning (STL) approach learns linear predictors wt in the original space R. In contrast, the proposed multitask learning (MTL) method learns a matrix U in Rd×k, which is used to map the original features xi into a new representation zi via zi = Uxi. The linear predictors wt are then learned in the subspace R. Let X in Rd×n be the matrix of stacked vectors xi, Z in Rk×n the matrix of stacked vectors zi, Y in {±1}n×T the matrix of labels, and W in R·×T the matrix of stacked predictors wt (the dimensionality of wt will be clear from the context). We define the following kernel matrices: K = KX = X>X , KZ = Z>Z, and M = KW = W>W . As mentioned in the paper, both solvers use precomputed kernel matrices and work with dual variables αt in R. We define A in Rn×T as the matrix of stacked dual variables for all tasks. STL-SDCA: The STL optimization problem for a task t is defined as follows:

[1]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[2]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.