Laplace maximum margin Markov networks

We propose Laplace max-margin Markov networks (LapM3N), and a general class of Bayesian M3N (BM3N) of which the LapM3N is a special case with sparse structural bias, for robust structured prediction. BM3N generalizes extant structured prediction rules based on point estimator to a Bayes-predictor using a learnt distribution of rules. We present a novel Structured Maximum Entropy Discrimination (SMED) formalism for combining Bayesian and max-margin learning of Markov networks for structured prediction, and our approach subsumes the conventional M3N as a special case. An efficient learning algorithm based on variational inference and standard convex-optimization solvers for M3N, and a generalization bound are offered. Our method outperforms competing ones on both synthetic and real OCR data.

[1]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[2]  John D. Lafferty,et al.  Boosting and Maximum Likelihood for Exponential Models , 2001, NIPS.

[3]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[4]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[5]  Xavier Carreras,et al.  Exponentiated gradient algorithms for log-linear structured prediction , 2007, ICML '07.

[6]  Nathan Ratliff,et al.  Online) Subgradient Methods for Structured Prediction , 2007 .

[7]  Martin J. Wainwright,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[8]  B. Schölkopf,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2007 .

[9]  Ben Taskar,et al.  Exponentiated Gradient Algorithms for Large-margin Structured Classification , 2004, NIPS.

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Alexander J. Smola,et al.  Support vector machine learning , 2001, Tutorial Guide. ISCAS 2001. IEEE International Symposium on Circuits and Systems (Cat. No.01TH8573).

[13]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[14]  John Langford,et al.  An Improved Predictive Accuracy Bound for Averaging Classifiers , 2001, ICML.

[15]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[16]  Miroslav Dudík,et al.  Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling , 2007, J. Mach. Learn. Res..

[17]  Nuno Vasconcelos,et al.  Direct convex relaxations of sparse SVM , 2007, ICML '07.

[18]  Yuan Qi,et al.  Bayesian Conditional Random Fields , 2005, AISTATS.

[19]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[20]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[21]  Ben Taskar,et al.  Structured Prediction via the Extragradient Method , 2005, NIPS.

[22]  Ata Kabán,et al.  On Bayesian classification with Laplace priors , 2007, Pattern Recognit. Lett..

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .