Search-based structured prediction

We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision. Searn is a meta-algorithm that transforms these complex problems into simple classification problems to which any binary classifier may be applied. Unlike current algorithms for structured learning that require decomposition of both the loss function and the feature functions over the predicted structure, Searn is able to learn prediction functions for any loss function and any class of features. Moreover, Searn comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies good performance on the structured prediction problem.

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[3]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[4]  Robert H. Kassel,et al.  A comparison of approaches to on-line handwritten character recognition , 1995 .

[5]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[6]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .

[7]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[8]  Mihalis Yannakakis,et al.  On the Complexity of Protein Folding , 1998, J. Comput. Biol..

[9]  Mihalis Yannakakis,et al.  On the complexity of protein folding (extended abstract) , 1998, STOC '98.

[10]  Dan Roth,et al.  The Use of Classifiers in Sequential Inference , 2001, NIPS.

[11]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[12]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[13]  David D. Lewis,et al.  Applying Support Vector Machines to the TREC-2001 Batch Filtering and Routing Tasks , 2001, TREC.

[14]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[15]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[16]  Daniel Marcu,et al.  Fast Decoding and Optimal Decoding for Machine Translation , 2001, ACL.

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  Chin-Yew Lin,et al.  From Single to Multi-document Summarization : A Prototype System and its Evaluation , 2002 .

[19]  Yee Whye Teh,et al.  An Alternate Objective Function for Markovian Fields , 2002, ICML.

[20]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[21]  Bernhard Schölkopf,et al.  Kernel Dependency Estimation , 2002, NIPS.

[22]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[23]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[24]  Eduard H. Hovy,et al.  From Single to Multi-document Summarization , 2002, ACL.

[25]  Daniel Marcu,et al.  A Noisy-Channel Model for Document Compression , 2002, ACL.

[26]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[27]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[28]  Vipin Kumar,et al.  Optimizing F-Measure with Support Vector Machines , 2003, FLAIRS Conference.

[29]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[30]  Yuji Matsumoto,et al.  Fast Methods for Kernel-Based Text Analysis , 2003, ACL.

[31]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[32]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[33]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[34]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[35]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[36]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[37]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[38]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[39]  Daniel Marcu,et al.  Fast and optimal decoding for machine translation , 2004, Artif. Intell..

[40]  Fernando Pereira,et al.  Case-factor diagrams for structured probabilistic modeling , 2004, J. Comput. Syst. Sci..

[41]  Thomas Hofmann,et al.  Gaussian process classification for segmenting and annotating sequences , 2004, ICML.

[42]  Thomas P. Hayes,et al.  Error limiting reductions between classification tasks , 2005, ICML.

[43]  Daniel Gildea,et al.  Machine Translation as Lexicalized Parsing with Hooks , 2005, IWPT.

[44]  Dan Roth,et al.  The Necessity of Syntactic Parsing for Semantic Role Labeling , 2005, IJCAI.

[45]  Tat-Seng Chua,et al.  NUS at DUC 2005: Understanding Documents via Concept Links , 2005 .

[46]  Dan Roth,et al.  Learning and Inference over Constrained Output , 2005, IJCAI.

[47]  Andrew McCallum,et al.  Feature Bagging: Preventing Weight Undertraining in Structured Discriminative Learning , 2005 .

[48]  Fernando Pereira,et al.  Identifying gene and protein mentions in text using conditional random fields , 2005, BMC Bioinformatics.

[49]  William W. Cohen,et al.  Stacked Sequential Learning , 2005, IJCAI.

[50]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[51]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[52]  Daniel Marcu,et al.  A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model , 2005, HLT.

[53]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[54]  Eugene Charniak,et al.  Supervised and Unsupervised Learning for Sentence Compression , 2005, ACL.

[55]  John Langford,et al.  Relating reinforcement learning performance to classification performance , 2005, ICML '05.

[56]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[57]  Jun'ichi Tsujii,et al.  Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[58]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[59]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[60]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[61]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[62]  I. Dan Melamed,et al.  Advances in Discriminative Parsing , 2006, ACL.

[63]  Daniel Marcu,et al.  Bayesian Query-Focused Summarization , 2006, ACL.

[64]  Daniel Marcu,et al.  Practical structured learning techniques for natural language processing , 2006 .

[65]  Daniel Marcu,et al.  Practical structured learning techniques for natural language processing , 2006 .

[66]  Martin J. Wainwright,et al.  Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting , 2006, J. Mach. Learn. Res..

[67]  Ryan T. McDonald Discriminative Sentence Compression with Soft Syntactic Evidence , 2006, EACL.

[68]  Ryan T. McDonald Discriminative Sentence Compression with Soft Syntactic Evidence , 2006, EACL.

[69]  Giorgio Satta,et al.  Guided Learning for Bidirectional Sequence Classification , 2007, ACL.

[70]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[71]  D. Marcu,et al.  Bayesian Summarization at DUC and a Suggestion for Extrinsic Evaluation , 2022 .