Large Margin Methods for Structured Output Prediction

Many real-life data problems require effective classification algorithms able to model structural dependencies between multiple labels and to perform classification in a multivariate setting, i.e. such that complex, non-scalar predictions must be produced in correspondence to input vectors. Examples of these tasks range from natural language parsing to speech recognition, machine translation, image segmentation, handwritten character recognition or gene prediction.

[1]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[2]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[3]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[4]  Ariadna J Quattoni Object Recognition with Latent Conditional Random Fields , 2005 .

[5]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[6]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[7]  John C. Platt Using Analytic QP and Sparseness to Speed Training of Support Vector Machines , 1998, NIPS.

[8]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[9]  Ben Taskar,et al.  Exponentiated Gradient Algorithms for Large-margin Structured Classification , 2004, NIPS.

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Tong Zhang,et al.  Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[12]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[13]  Michael Collins,et al.  Parameter Estimation for Statistical Parsing Models: Theory and Practice of , 2001, IWPT.

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Ben Taskar,et al.  Structured Prediction via the Extragradient Method , 2005, NIPS.

[16]  Martial Hebert,et al.  Discriminative random fields: a discriminative framework for contextual interaction in classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[18]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[19]  R. Fletcher Practical Methods of Optimization , 1988 .

[20]  G. Nemhauser,et al.  Integer Programming , 2020 .

[21]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[22]  Juho Rousu,et al.  Learning hierarchical multi-category text classification models , 2005, ICML.

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[25]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[26]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[27]  W. Bruce Croft,et al.  Table extraction using conditional random fields , 2003, DG.O.

[28]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[29]  Mark W. Schmidt,et al.  Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[30]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[31]  Robert H. Kassel,et al.  A comparison of approaches to on-line handwritten character recognition , 1995 .

[32]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..