Large Margin Methods for Structured and Interdependent Output Variables

Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary issue of designing classification algorithms that can deal with more complex outputs, such as trees, sequences, or sets. More generally, we consider problems involving multiple dependent output variables, structured output spaces, and classification problems with class attributes. In order to accomplish this, we propose to appropriately generalize the well-known notion of a separation margin and derive a corresponding maximum-margin formulation. While this leads to a quadratic program with a potentially prohibitive, i.e. exponential, number of constraints, we present a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems. The proposed method has important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. Experiments from various domains involving different types of output spaces emphasize the breadth and generality of our approach.

[1]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[2]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[3]  Martin Grötschel,et al.  The ellipsoid method and its consequences in combinatorial optimization , 1981, Comb..

[4]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[5]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[6]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[9]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[10]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[11]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[12]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[13]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[16]  Bernhard Schölkopf,et al.  Kernel Dependency Estimation , 2002, NIPS.

[17]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[18]  Dan Roth,et al.  Constraint Classification for Multiclass Classification and Ranking , 2002, NIPS.

[19]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[20]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[21]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[22]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[23]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[24]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[25]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[26]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[27]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[28]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[29]  Thorsten Joachims,et al.  Learning to Align Sequences: A Maximum-Margin Approach , 2006 .

[30]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.