Structured prediction via output space search

We consider a framework for structured prediction based on search in the space of complete structured outputs. Given a structured input, an output is produced by running a time-bounded search procedure guided by a learned cost function, and then returning the least cost output uncovered during the search. This framework can be instantiated for a wide range of search spaces and search procedures, and easily incorporates arbitrary structured-prediction loss functions. In this paper, we make two main technical contributions. First, we describe a novel approach to automatically defining an effective search space over structured outputs, which is able to leverage the availability of powerful classification learning algorithms. In particular, we define the limited-discrepancy search space and relate the quality of that space to the quality of learned classifiers. We also define a sparse version of the search space to improve the effciency of our overall approach. Second, we give a generic cost function learning approach that is applicable to a wide range of search procedures. The key idea is to learn a cost function that attempts to mimic the behavior of conducting searches guided by the true loss function. Our experiments on six benchmark domains show that a small amount of search in limited discrepancy search space is often sufficient for significantly improving on state-of-the-art structured-prediction performance. We also demonstrate significant speed improvements for our approach using sparse search spaces with little or no loss in accuracy.

[1]  David A. McAllester,et al.  The Generalized A* Architecture , 2007, J. Artif. Intell. Res..

[2]  Tommi S. Jaakkola,et al.  Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.

[3]  Alan Fern,et al.  HC-Search: Learning Heuristics and Cost Functions for Structured Prediction , 2013, AAAI.

[4]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[7]  Kilian Q. Weinberger,et al.  The Greedy Miser: Learning under Test-time Budgets , 2012, ICML.

[8]  Ben Taskar,et al.  Sidestepping Intractable Inference with Structured Ensemble Cascades , 2010, NIPS.

[9]  Xuanjing Huang,et al.  Sparse higher order conditional random fields for improved sequence labeling , 2009, ICML '09.

[10]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[11]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[12]  Andrew McCallum,et al.  Piecewise training for structured prediction , 2009, Machine Learning.

[13]  Robert E. Schapire,et al.  A Reduction from Apprenticeship Learning to Classification , 2010, NIPS.

[14]  Bernt Schiele,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2022 .

[15]  Matthew L. Ginsberg,et al.  Limited Discrepancy Search , 1995, IJCAI.

[16]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[17]  Alan Fern,et al.  Learning Linear Ranking Functions for Beam Search with Application to Planning , 2009, J. Mach. Learn. Res..

[18]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[19]  J. Andrew Bagnell,et al.  SpeedBoost: Anytime Prediction with Uniform Near-Optimality , 2012, AISTATS.

[20]  Tommi S. Jaakkola,et al.  More data means less inference: A pseudo-max approach to structured learning , 2010, NIPS.

[21]  Ming-Wei Chang,et al.  Structured learning with constrained conditional models , 2012, Machine Learning.

[22]  Thomas G. Dietterich,et al.  A comparison of ID3 and backpropagation for English text-to-speech mapping , 2004, Machine Learning.

[23]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[24]  Tamir Hazan,et al.  Efficient Learning of Structured Predictors in General Graphical Models , 2012, ArXiv.

[25]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[26]  Deva Ramanan,et al.  N-best maximal decoders for part models , 2011, 2011 International Conference on Computer Vision.

[27]  Yoram Singer,et al.  Phoneme alignment based on discriminative learning , 2005, INTERSPEECH.

[28]  Dan Roth,et al.  Integer linear programming inference for conditional random fields , 2005, ICML.

[29]  Alan Fern,et al.  HC-Search: A Learning Framework for Search-based Structured Prediction , 2014, J. Artif. Intell. Res..

[30]  Andrew McCallum,et al.  SampleRank: Training Factor Graphs with Atomic Gradients , 2011, ICML.

[31]  Thomas G. Dietterich,et al.  Learning to Detect Basal Tubules of Nematocysts in SEM Images , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[32]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[33]  Dan Roth,et al.  Efficient Decomposed Learning for Structured Prediction , 2012, ICML.

[34]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[35]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[36]  Yang Guo,et al.  Structured Perceptron with Inexact Search , 2012, NAACL.

[37]  Alan Fern,et al.  Output Space Search for Structured Prediction , 2012, ICML.

[38]  Gregory Shakhnarovich,et al.  Diverse M-Best Solutions in Markov Random Fields , 2012, ECCV.

[39]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[40]  Justin Domke,et al.  Structured Learning via Logistic Regression , 2013, NIPS.

[41]  Haitao Mi,et al.  Max-Violation Perceptron and Forced Decoding for Scalable MT Training , 2013, EMNLP.

[42]  Christoph H. Lampert,et al.  Computing the M Most Probable Modes of a Graphical Model , 2013, AISTATS.

[43]  Daniel Marcu,et al.  Practical structured learning techniques for natural language processing , 2006 .

[44]  Andrew W. Moore,et al.  Learning Evaluation Functions to Improve Optimization by Local Search , 2001, J. Mach. Learn. Res..

[45]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[46]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[47]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[48]  Dan Wu,et al.  Conditional Random Fields with High-Order Features for Sequence Labeling , 2009, NIPS.