Abstract

Many applications of supervised learning require good generalization from limited labeled data. In the Bayesian setting, we can use an informative prior to try to achieve this goal by encoding useful domain knowledge. We present an algorithm that constructs such an informative prior for a given discrete supervised learning task. The algorithm uses other “similar” learning problems to discover properties of optimal classifiers, expressed in terms of covariance estimates for pairs of feature parameters. A semidefinite program is then used to combine these estimates and learn a good prior for the current learning task. We apply our methods to binary text classification, and demonstrate a 20 to 40% error reduction over a commonly used prior.