论文信息 - Structured prediction via output space search - 字舞流文

Structured prediction via output space search

We consider a framework for structured prediction based on search in the space of complete structured outputs. Given a structured input, an output is produced by running a time-bounded search procedure guided by a learned cost function, and then returning the least cost output uncovered during the search. This framework can be instantiated for a wide range of search spaces and search procedures, and easily incorporates arbitrary structured-prediction loss functions. In this paper, we make two main technical contributions. First, we describe a novel approach to automatically defining an effective search space over structured outputs, which is able to leverage the availability of powerful classification learning algorithms. In particular, we define the limited-discrepancy search space and relate the quality of that space to the quality of learned classifiers. We also define a sparse version of the search space to improve the effciency of our overall approach. Second, we give a generic cost function learning approach that is applicable to a wide range of search procedures. The key idea is to learn a cost function that attempts to mimic the behavior of conducting searches guided by the true loss function. Our experiments on six benchmark domains show that a small amount of search in limited discrepancy search space is often sufficient for significantly improving on state-of-the-art structured-prediction performance. We also demonstrate significant speed improvements for our approach using sparse search spaces with little or no loss in accuracy.

Alan Fern | Janardhan Rao Doppa | Prasad Tadepalli | Alan Fern | P. Tadepalli | J. Doppa

[1] David A. McAllester,et al. The Generalized A* Architecture , 2007, J. Artif. Intell. Res..

[2] Tommi S. Jaakkola,et al. Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.

[3] Alan Fern,et al. HC-Search: Learning Heuristics and Cost Functions for Structured Prediction , 2013, AAAI.

[4] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.

[5] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.

[7] Kilian Q. Weinberger,et al. The Greedy Miser: Learning under Test-time Budgets , 2012, ICML.

[8] Ben Taskar,et al. Sidestepping Intractable Inference with Structured Ensemble Cascades , 2010, NIPS.

[9] Xuanjing Huang,et al. Sparse higher order conditional random fields for improved sequence labeling , 2009, ICML '09.

[10] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[11] Daniel Marcu,et al. Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[12] Andrew McCallum,et al. Piecewise training for structured prediction , 2009, Machine Learning.

[13] Robert E. Schapire,et al. A Reduction from Apprenticeship Learning to Classification , 2010, NIPS.

[14] Bernt Schiele,et al. International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2022 .

[15] Matthew L. Ginsberg,et al. Limited Discrepancy Search , 1995, IJCAI.

[16] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[17] Alan Fern,et al. Learning Linear Ranking Functions for Beam Search with Application to Planning , 2009, J. Mach. Learn. Res..

[18] Terrence J. Sejnowski,et al. Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[19] J. Andrew Bagnell,et al. SpeedBoost: Anytime Prediction with Uniform Near-Optimality , 2012, AISTATS.

[20] Tommi S. Jaakkola,et al. More data means less inference: A pseudo-max approach to structured learning , 2010, NIPS.

[21] Ming-Wei Chang,et al. Structured learning with constrained conditional models , 2012, Machine Learning.

[22] Thomas G. Dietterich,et al. A comparison of ID3 and backpropagation for English text-to-speech mapping , 2004, Machine Learning.

[23] Thomas Hofmann,et al. Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[24] Tamir Hazan,et al. Efficient Learning of Structured Predictors in General Graphical Models , 2012, ArXiv.

[25] Koby Crammer,et al. Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[26] Deva Ramanan,et al. N-best maximal decoders for part models , 2011, 2011 International Conference on Computer Vision.

[27] Yoram Singer,et al. Phoneme alignment based on discriminative learning , 2005, INTERSPEECH.

[28] Dan Roth,et al. Integer linear programming inference for conditional random fields , 2005, ICML.

[29] Alan Fern,et al. HC-Search: A Learning Framework for Search-based Structured Prediction , 2014, J. Artif. Intell. Res..

[30] Andrew McCallum,et al. SampleRank: Training Factor Graphs with Atomic Gradients , 2011, ICML.

[31] Thomas G. Dietterich,et al. Learning to Detect Basal Tubules of Nematocysts in SEM Images , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[32] Ben Taskar,et al. Structured Prediction Cascades , 2010, AISTATS.

[33] Dan Roth,et al. Efficient Decomposed Learning for Structured Prediction , 2012, ICML.

[34] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[35] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[36] Yang Guo,et al. Structured Perceptron with Inexact Search , 2012, NAACL.

[37] Alan Fern,et al. Output Space Search for Structured Prediction , 2012, ICML.

[38] Gregory Shakhnarovich,et al. Diverse M-Best Solutions in Markov Random Fields , 2012, ECCV.

[39] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[40] Justin Domke,et al. Structured Learning via Logistic Regression , 2013, NIPS.

[41] Haitao Mi,et al. Max-Violation Perceptron and Forced Decoding for Scalable MT Training , 2013, EMNLP.

[42] Christoph H. Lampert,et al. Computing the M Most Probable Modes of a Graphical Model , 2013, AISTATS.

[43] Daniel Marcu,et al. Practical structured learning techniques for natural language processing , 2006 .

[44] Andrew W. Moore,et al. Learning Evaluation Functions to Improve Optimization by Local Search , 2001, J. Mach. Learn. Res..

[45] Tamir Hazan,et al. Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[46] Veselin Stoyanov,et al. Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[47] Michael Collins,et al. Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[48] Dan Wu,et al. Conditional Random Fields with High-Order Features for Sequence Labeling , 2009, NIPS.