Second order optimization of kernel parameters

In kernel methods such as SVMs, the data representation, implicitly chosen through the so-called kernel K(x,x′), strongly influences the performances. Recent applications [3] and developments based on SVMs have shown that using multiple kernels instead of a single one can enhance interpretability of the decision function and improve classifier performance. In such cases, a common approach is to consider that the kernel K(x,x′) is actually a convex linear combination of other basis kernels: