Multi-core Structural SVM Training

Many problems in natural language processing and computer vision can be framed as structured prediction problems. Structural support vector machines (SVM) is a popular approach for training structured predictors, where learning is framed as an optimization problem. Most structural SVM solvers alternate between a model update phase and an inference phase (which predicts structures for all training examples). As structures become more complex, inference becomes a bottleneck and thus slows down learning considerably. In this paper, we propose a new learning algorithm for structural SVMs called DEMIDCD that extends the dual coordinate descent approach by decoupling the model update and inference phases into different threads. We take advantage of multicore hardware to parallelize learning with minimal synchronization between the model update and the inference phases.We prove that our algorithm not only converges but also fully utilizes all available processors to speed up learning, and validate our approach on two real-world NLP problems: part-of-speech tagging and relation extraction. In both cases, we show that our algorithm utilizes all available processors to speed up learning and achieves competitive performance. For example, it achieves a relative duality gap of 1% on a POS tagging problem in 192 seconds using 16 threads, while a standard implementation of a multi-threaded dual coordinate descent algorithm with the same number of threads requires more than 600 seconds to reach a solution of the same quality.

[1]  Chih-Jen Lin,et al.  Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[2]  Dan Roth,et al.  A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[3]  Tommi S. Jaakkola,et al.  Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.

[4]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[5]  Dan Roth,et al.  Selective block minimization for faster convergence of limited memory large-scale linear models , 2011, KDD.

[6]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[9]  Thomas Hofmann,et al.  Map-Reduce for Machine Learning on Multicore , 2007 .

[10]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[11]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[12]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[13]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[14]  Ben Taskar,et al.  Introduction to statistical relational learning , 2007 .

[15]  D. Roth 1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation , 2007 .

[16]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[17]  John Langford,et al.  Slow Learners are Fast , 2009, NIPS.

[18]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[20]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[21]  Ming-Wei Chang,et al.  Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction , 2013, TACL.

[22]  Gideon S. Mann,et al.  Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.

[23]  Ming-Wei Chang,et al.  Structured Output Learning with Indirect Supervision , 2010, ICML.

[24]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.