Joint Kernel Support Estimation for Structured Prediction

We present a new technique for structured prediction that works in a hybrid generative/discriminative way, using a one-class support vector machine to model the joint probability of (input, output)-pairs in a joint reproducing kernel Hilbert space. Compared to discriminative techniques, like conditional random fields or structured output SVMs, the proposed method has the advantage that its training time depends only on the number of training examples, not on the size of the label space. Due to its generative aspect, it is also very tolerant against ambiguous, incomplete or incorrect labels. Experiments on realistic data show that our method works efficiently and robustly in situations for which discriminative techniques have computational or statistical problems.

[1]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[2]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[3]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[4]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[5]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[6]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[7]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[8]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[9]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Kashima Hisashi,et al.  Kernel-based Discriminative Learning Algorithms for Labeling Sequences, Trees and Graphs , 2004 .

[12]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[13]  Hisashi Kashima,et al.  Kernel-based discriminative learning algorithms for labeling sequences, trees, and graphs , 2004, ICML '04.

[14]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.