Learning and Inference for Structured Prediction: A Unifying Perspective

In a structured prediction problem, one needs to learn a predictor that, given a structured input, produces a structured object, such as a sequence, tree, or clustering output. Prototypical structured prediction tasks include part-of-speech tagging (predicting POS tag sequence for an input sentence) and semantic segmentation of images (predicting semantic labels for pixels of an input image). Unlike simple classification problems, here there is a need to assign values to multiple output variables accounting for the dependencies between them. Consequently, the prediction step itself (aka “inference” or “decoding”) is computationally-expensive, and so is the learning process, that typically requires making predictions as part of it. The key learning and inference challenge is due to the exponential size of the structured output space and depend on its complexity. In this paper, we present a unifying perspective of the different frameworks that address structured prediction problems and compare them in terms of their strengths and weaknesses. We also discuss important research directions including integration of deep learning advances into structured prediction methods, and learning from weakly supervised signals and active querying to overcome the challenges of building structured predictors from small amount of labeled data.

[1]  Ofer Meshi,et al.  Deep Structured Prediction with Nonlinear Output Transformations , 2018, NeurIPS.

[2]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[3]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[4]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[5]  Stefano Ermon,et al.  Label-Free Supervision of Neural Networks with Physics and Domain Knowledge , 2016, AAAI.

[6]  Yang Guo,et al.  Structured Perceptron with Inexact Search , 2012, NAACL.

[7]  Sanket Vaibhav Mehta,et al.  Gradient-Based Inference for Networks with Output Constraints , 2017, AAAI.

[8]  Tommi S. Jaakkola,et al.  More data means less inference: A pseudo-max approach to structured learning , 2010, NIPS.

[9]  David A. McAllester,et al.  The Generalized A* Architecture , 2007, J. Artif. Intell. Res..

[10]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[11]  Dan Roth,et al.  A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[12]  Thorsten Joachims,et al.  Online Structured Prediction via Coactive Learning , 2012, ICML.

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  Alexander M. Rush,et al.  Structured Attention Networks , 2017, ICLR.

[15]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[16]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[17]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[18]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[19]  Dan Klein,et al.  Structure compilation: trading structure for features , 2008, ICML '08.

[20]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[21]  Thomas G. Dietterich,et al.  Learning Greedy Policies for the Easy-First Framework , 2015, AAAI.

[22]  E. Ginzberg What's the question? , 1992, Health management quarterly : HMQ.

[23]  Dan Roth,et al.  Partial Or Complete, That’s The Question , 2019, NAACL.

[24]  Mohammad Norouzi,et al.  Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs , 2017, ICML.

[25]  Martial Hebert,et al.  Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.

[26]  Ming-Wei Chang,et al.  Driving Semantic Parsing from the World’s Response , 2010, CoNLL.

[27]  Dan Roth,et al.  Incidental Supervision: Moving beyond Supervised Learning , 2017, AAAI.

[28]  Alan Fern,et al.  Structured prediction via output space search , 2014, J. Mach. Learn. Res..

[29]  Ming-Wei Chang,et al.  Structured Output Learning with Indirect Supervision , 2010, ICML.

[30]  Dan Roth,et al.  Margin-Based Active Learning for Structured Output Spaces , 2006, ECML.

[31]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[32]  Dan Roth,et al.  Efficient Decomposed Learning for Structured Prediction , 2012, ICML.

[33]  Dan Klein,et al.  A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[34]  Eric P. Xing,et al.  Connecting the Dots Between MLE and RL for Sequence Generation , 2018, DeepRLStructPred@ICLR.

[35]  Yoav Goldberg,et al.  An Efficient Algorithm for Easy-First Non-Directional Dependency Parsing , 2010, NAACL.

[36]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[37]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[38]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[39]  Thomas G. Dietterich,et al.  ℋC-search for structured prediction in computer vision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[41]  Alan Fern,et al.  Learning Linear Ranking Functions for Beam Search with Application to Planning , 2009, J. Mach. Learn. Res..

[42]  Ming-Wei Chang,et al.  Structured learning with constrained conditional models , 2012, Machine Learning.

[43]  Veselin Sto Easy-first Coreference Resolution , 2012 .

[44]  Gourab Kundu,et al.  Structural Learning with Amortized Inference , 2015, AAAI.

[45]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[46]  Jun Yu,et al.  HC-Search for Multi-Label Prediction: An Empirical Study , 2014, AAAI.

[47]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[48]  Xiaoli Z. Fern,et al.  Multi-Task Structured Prediction for Entity Analysis: Search-Based Learning Algorithms , 2017, ACML.

[49]  Alan Fern,et al.  HC-Search: A Learning Framework for Search-based Structured Prediction , 2014, J. Artif. Intell. Res..

[50]  Lifu Tu,et al.  Learning Approximate Inference Networks for Structured Prediction , 2018, ICLR.

[51]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[52]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[53]  Andrew McCallum,et al.  End-to-End Learning for Structured Prediction Energy Networks , 2017, ICML.

[54]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[55]  Guy Van den Broeck,et al.  A Semantic Loss Function for Deep Learning with Symbolic Knowledge , 2017, ICML.