论文信息 - Large margin methods for structured classification : Exponentiated gradient algorithms and PAC-Bayesian generalization bounds

Large margin methods for structured classification : Exponentiated gradient algorithms and PAC-Bayesian generalization bounds

We consider the problem of structured classification , where the task is to predict a label y from an inputx, andy has meaningful internal structure. Our framework includes supervised training of both Markov random fields and weighted context-free grammars as special cases. We describe an algorithm that solves the large-margin optimization problem defined in [12], using an exponentialfamily (Gibbs distribution) representation of structured objects. The algorithm is efficient – even in cases where the number of labels y i exponential in size – provided that certain expectations under Gibbs distributions can be calculated efficiently. The optimization method we use for structured labels relies on a more general result, specifically the application of exponentiated gradient (EG) updates [4, 5] to quadratic programs (QPs). We describe a new method for solving QPs based on these techniques, and give bounds on its rate of convergence. In addition to their application to the structured-labels task, the EG updates lead to simple algorithms for optimizing “conventional” binary or multiclass SVM problems. Finally, we give a new generalization bound for structured classification, using PAC-Bayesian methods for the analysis of large margin classifiers.

[1] Sôichi Kakeya,et al. On Differential Inequalities , 1918 .

[2] J. Lamperti. ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[3] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[4] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[5] Michael Collins,et al. Parameter Estimation for Statistical Parsing Models: Theory and Practice of , 2001, IWPT.

[6] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[7] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[9] David A. McAllester. Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[10] John Shawe-Taylor,et al. PAC Bayes and Margins , 2003 .

[11] Manfred K. Warmuth,et al. Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.