论文信息 - Multi-core Structural SVM Training

Multi-core Structural SVM Training

Many problems in natural language processing and computer vision can be framed as structured prediction problems. Structural support vector machines (SVM) is a popular approach for training structured predictors, where learning is framed as an optimization problem. Most structural SVM solvers alternate between a model update phase and an inference phase (which predicts structures for all training examples). As structures become more complex, inference becomes a bottleneck and thus slows down learning considerably. In this paper, we propose a new learning algorithm for structural SVMs called DEMIDCD that extends the dual coordinate descent approach by decoupling the model update and inference phases into different threads. We take advantage of multicore hardware to parallelize learning with minimal synchronization between the model update and the inference phases.We prove that our algorithm not only converges but also fully utilizes all available processors to speed up learning, and validate our approach on two real-world NLP problems: part-of-speech tagging and relation extraction. In both cases, we show that our algorithm utilizes all available processors to speed up learning and achieves competitive performance. For example, it achieves a relative duality gap of 1% on a POS tagging problem in 192 seconds using 16 threads, while a standard implementation of a multi-threaded dual coordinate descent algorithm with the same number of threads requires more than 600 seconds to reach a solution of the same quality.

Dan Roth | Kai-Wei Chang | Vivek Srikumar

[1] Chih-Jen Lin,et al. Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[2] Dan Roth,et al. A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[3] Tommi S. Jaakkola,et al. Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.

[4] Z.-Q. Luo,et al. Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[5] Dan Roth,et al. Selective block minimization for faster convergence of limited memory large-scale linear models , 2011, KDD.

[6] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.

[7] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[9] Thomas Hofmann,et al. Map-Reduce for Machine Learning on Multicore , 2007 .

[10] Chih-Jen Lin,et al. A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[11] Thorsten Joachims,et al. Training structural SVMs when exact inference is intractable , 2008, ICML '08.