Multi-task feature and kernel selection for SVMs

We compute a common feature selection or kernel selection configuration for multiple support vector machines (SVMs) trained on different yet inter-related datasets. The method is advantageous when multiple classification tasks and differently labeled datasets exist over a common input space. Different datasets can mutually reinforce a common choice of representation or relevant features for their various classifiers. We derive a multi-task representation learning approach using the maximum entropy discrimination formalism. The resulting convex algorithms maintain the global solution properties of support vector machines. However, in addition to multiple SVM classification/regression parameters they also jointly estimate an optimal subset of features or optimal combination of kernels. Experiments are shown on standardized datasets.