Multilabel Structured Output Learning with Random Spanning Trees of Max-Margin Markov Networks

We show that the usual score function for conditional Markov networks can be written as the expectation over the scores of their spanning trees. We also show that a small random sample of these output trees can attain a significant fraction of the margin obtained by the complete graph and we provide conditions under which we can perform tractable inference. The experimental results confirm that practical learning is scalable to realistic datasets using this approach.

[1]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[2]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[3]  Kristin P. Bennett,et al.  Combining support vector and mathematical programming methods for classification , 1999 .

[4]  Tommi S. Jaakkola,et al.  Approximate inference using planar graph decomposition , 2006, NIPS.

[5]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[6]  John Shawe-Taylor,et al.  PAC-Bayesian Inequalities for Martingales , 2011, IEEE Transactions on Information Theory.

[7]  Martin J. Wainwright,et al.  Semidefinite Relaxations for Approximate Inference on Graphs with Cycles , 2003, NIPS.

[8]  Juho Rousu,et al.  Multilabel classification through random graph ensembles , 2014, Machine Learning.

[9]  Juho Rousu,et al.  Kernel-Based Learning of Hierarchical Multilabel Classification Models , 2006, J. Mach. Learn. Res..

[10]  Andreas Maurer,et al.  A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[11]  Thomas Gärtner,et al.  On Structured Output Training: Hard Cases and an Efficient Alternative , 2009, ECML/PKDD.

[12]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[13]  Gökhan BakIr,et al.  Efficient Algorithms for Max-Margin Structured Classification , 2007 .

[14]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[15]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[16]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .