Advanced Structured Prediction

The goal of structured prediction is to build machine learning models that predict relational information that itself has structure, such as being composed of multiple interrelated parts. These models, which reflect prior knowledge, task-specific relations, and constraints, are used in fields including computer vision, speech recognition, natural language processing, and computational biology. They can carry out such tasks as predicting a natural language sentence, or segmenting an image into meaningful components. These models are expressive and powerful, but exact computation is often intractable. A broad research effort in recent years has aimed at designing structured prediction models and approximate inference and learning procedures that are computationally efficient. This volume offers an overview of this recent research in order to make the work accessible to a broader research community. The chapters, by leading researchers in the field, cover a range of topics, including research trends, the linear programming relaxation approach, innovations in probabilistic modeling, recent theoretical progress, and resource-aware learning.Sebastian Nowozin is a Researcher in the Machine Learning and Perception group (MLP) at Microsoft Research, Cambridge, England. Peter V. Gehler is a Senior Researcher in the Perceiving Systems group at the Max Planck Institute for Intelligent Systems, Tbingen, Germany. Jeremy Jancsary is a Senior Research Scientist at Nuance Communications, Vienna. Christoph H. Lampert is Assistant Professor at the Institute of Science and Technology Austria, where he heads a group for Computer Vision and Machine Learning. Contributors Jonas Behr, Yutian Chen, Fernando De La Torre, Justin Domke, Peter V. Gehler, Andrew E. Gelfand, Sbastien Gigure, Amir Globerson, Fred A. Hamprecht, Minh Hoai, Tommi Jaakkola, Jeremy Jancsary, Joseph Keshet, Marius Kloft, Vladimir Kolmogorov, Christoph H. Lampert, Franois Laviolette, Xinghua Lou, Mario Marchand, Andr F. T. Martins, Ofer Meshi, Sebastian Nowozin, George Papandreou, Daniel Pra, Gunnar Rtsch, Amlie Rolland, Bogdan Savchynskyy, Stefan Schmidt, Thomas Schoenemann, Gabriele Schweikert, Ben Taskar, Sinisa Todorovic, Max Welling, David Weiss, Thom Werner, Alan Yuille, Stanislav ivn

[1]  Maja Pantic,et al.  Non-rigid registration using free-form deformations for recognition of facial actions and their temporal dynamics , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[2]  Olga Veksler,et al.  Tiered scene labeling with dynamic programming , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Martin J. Wainwright,et al.  Message-passing for graph-structured linear programs: proximal projections, convergence and rounding schemes , 2008, ICML '08.

[4]  D. Schlesinger,et al.  TRANSFORMING AN ARBITRARY MINSUM PROBLEM INTO A BINARY ONE , 2006 .

[5]  Gwen Littlewort,et al.  Recognizing facial expression: machine learning and application to spontaneous behavior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[7]  Anna Huber,et al.  Towards Minimizing k-Submodular Functions , 2012, ISCO.

[8]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[9]  Shai Avidan,et al.  Subset selection for efficient SVM tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[11]  Torres Martins,et al.  The Geometry of Constrained Structured Prediction: Applications to Inference and Learning of Natural Language Syntax , 2012 .

[12]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[13]  Tamir Hazan,et al.  PAC-Bayesian approach for minimization of phoneme error rate , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Ugo Montanari,et al.  Networks of constraints: Fundamental properties and applications to picture processing , 1974, Inf. Sci..

[15]  Martin C. Cooper Minimization of Locally Defined Submodular Functions by Optimal Soft Arc Consistency , 2007, Constraints.

[16]  Andrei A. Bulatov,et al.  A dichotomy theorem for constraint satisfaction problems on a 3-element set , 2006, JACM.

[17]  Stanislav Zivny,et al.  An Algebraic Theory of Complexity for Valued Constraints: Establishing a Galois Connection , 2011, MFCS.

[18]  I. Smal,et al.  Tracking in cell and developmental biology. , 2009, Seminars in cell & developmental biology.

[19]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  M. Hestenes Multiplier and gradient methods , 1969 .

[21]  Fernando De la Torre,et al.  Unsupervised Temporal Commonality Discovery , 2012, ECCV.

[22]  Mirella Lapata,et al.  Multiple Aspect Summarization Using Integer Linear Programming , 2012, EMNLP.

[23]  M. Welling,et al.  Statistical inference using weak chaos and infinite memory , 2010 .

[24]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[25]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[26]  Robert Michael Tanner,et al.  A recursive approach to low complexity codes , 1981, IEEE Trans. Inf. Theory.

[27]  Fernando De la Torre,et al.  Action unit detection with segment-based SVMs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[29]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[30]  Fred A. Hamprecht,et al.  Structured Learning from Partial Annotations , 2012, ICML.

[31]  André F. T. Martins,et al.  Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning , 2013, ACL.

[32]  Stanislav Zivny,et al.  The Power of Linear Programming for Valued CSPs , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[33]  Fred A. Hamprecht,et al.  Structured Learning for Cell Tracking , 2011, NIPS.

[34]  Noah A. Smith,et al.  Dual Decomposition with Many Overlapping Components , 2011, EMNLP.

[35]  Tomas Werner,et al.  Revisiting the Decomposition Approach to Inference in Exponential Families and Graphical Models , 2009 .

[36]  Yair Weiss,et al.  MAP Estimation, Linear Programming and Belief Propagation with Convex Free Energies , 2007, UAI.

[37]  Jörg H. Kappes,et al.  OpenGM: A C++ Library for Discrete Graphical Models , 2012, ArXiv.

[38]  Eric P. Xing,et al.  Concise Integer Linear Programming Formulations for Dependency Parsing , 2009, ACL.

[39]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[40]  Luc Van Gool,et al.  Active MAP Inference in CRFs for Efficient Semantic Segmentation , 2013, 2013 IEEE International Conference on Computer Vision.

[41]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[42]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[43]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[44]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[45]  Monique Guignard-Spielberg,et al.  Lagrangean decomposition: A model yielding stronger lagrangean bounds , 1987, Math. Program..

[46]  Fredrik Kuivinen,et al.  On the complexity of submodular function minimisation on diamonds , 2009, Discret. Optim..

[47]  Stephen Gould,et al.  Accelerated dual decomposition for MAP inference , 2010, ICML.

[48]  Alexander J. Smola,et al.  Tighter Bounds for Structured Estimation , 2008, NIPS.

[49]  Rüdiger L. Urbanke,et al.  Modern Coding Theory , 2008 .

[50]  Rina Dechter,et al.  Constraint Processing , 1995, Lecture Notes in Computer Science.

[51]  Alexander Schrijver,et al.  A Combinatorial Algorithm Minimizing Submodular Functions in Strongly Polynomial Time , 2000, J. Comb. Theory B.

[52]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[53]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[55]  George B. Dantzig,et al.  Decomposition Principle for Linear Programs , 1960 .

[56]  Gian Luca Foresti,et al.  Trajectory-Based Anomalous Event Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[57]  D. SIAMJ. BISUBMODULAR FUNCTION MINIMIZATION∗ , 2006 .

[58]  Michael I. Jordan,et al.  Nonparametric Bayesian Learning of Switching Linear Dynamical Systems , 2008, NIPS.

[59]  Vladimir Kolmogorov,et al.  Optimizing Binary MRFs via Extended Roof Duality , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[61]  Sebastian Nowozin,et al.  A Comparative Study of Modern Inference Techniques for Discrete Energy Minimization Problems , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[63]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Dmitry M. Malioutov,et al.  Lagrangian Relaxation for MAP Estimation in Graphical Models , 2007, ArXiv.

[65]  Hinrich Schütze,et al.  Stopping Criteria for Active Learning of Named Entity Recognition , 2008, COLING.

[66]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[67]  Yair Weiss,et al.  Linear Programming Relaxations and Belief Propagation - An Empirical Study , 2006, J. Mach. Learn. Res..

[68]  Ullrich Köthe,et al.  Deltr: Digital embryo lineage tree reconstructor , 2011, 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[69]  Phokion G. Kolaitis,et al.  Constraint Satisfaction, Bounded Treewidth, and Finite-Variable Logics , 2002, CP.

[70]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[71]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[72]  Michael J. Black,et al.  Fields of Experts , 2009, International Journal of Computer Vision.

[73]  Arindam Banerjee,et al.  Online Alternating Direction Method , 2012, ICML.

[74]  Ofer Meshi,et al.  An Alternating Direction Method for Dual MAP LP Relaxation , 2011, ECML/PKDD.

[75]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76]  Martin C. Cooper,et al.  An Algebraic Characterisation of Complexity for Valued Constraint , 2006, CP.

[77]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .

[78]  Jake K. Aggarwal,et al.  Human motion analysis: a review , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[79]  Derek Hoiem,et al.  Learning CRFs Using Graph Cuts , 2008, ECCV.

[80]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[81]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[82]  Changbo Hu,et al.  AAM derived face representations for robust facial action recognition , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[83]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[84]  Robert G. Bland,et al.  New Finite Pivoting Rules for the Simplex Method , 1977, Math. Oper. Res..

[85]  Fernando De la Torre,et al.  Maximum Margin Temporal Clustering , 2012, AISTATS.

[86]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[87]  S. Levin,et al.  On the boundedness of an iterative procedure for solving a system of linear inequalities , 1970 .

[88]  Christoph Schnörr,et al.  Efficient MRF Energy Minimization via Adaptive Diminishing Smoothing , 2012, UAI.

[89]  Michael K. Schneider,et al.  Krylov Subspace Estimation , 2000, SIAM J. Sci. Comput..

[90]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[91]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[92]  Vladimir Pavlovic,et al.  Impact of dynamic model learning on classification of human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[93]  M. Nikolova Model distortions in Bayesian MAP reconstruction , 2007 .

[94]  Satoru Iwata,et al.  A combinatorial strongly polynomial algorithm for minimizing submodular functions , 2001, JACM.

[95]  Joakim Nivre,et al.  Dependency Parsing , 2009, Lang. Linguistics Compass.

[96]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[97]  Bhaskar D. Rao,et al.  Variational EM Algorithms for Non-Gaussian Latent Variable Models , 2005, NIPS.

[98]  Anna Huber,et al.  Skew Bisubmodularity and Valued CSPs , 2013, SIAM J. Comput..

[99]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[100]  Andreas Krause,et al.  Optimal Value of Information in Graphical Models , 2009, J. Artif. Intell. Res..

[101]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[102]  Libor Barto,et al.  The CSP Dichotomy Holds for Digraphs with No Sources and No Sinks (A Positive Answer to a Conjecture of Bang-Jensen and Hell) , 2008, SIAM J. Comput..

[103]  Andrei A. Bulatov,et al.  Complexity of conservative constraint satisfaction problems , 2011, TOCL.

[104]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[105]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[106]  David A. Smith,et al.  Minimum Risk Annealing for Training Log-Linear Models , 2006, ACL.

[107]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[108]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[109]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[110]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[111]  Siam J. CoMPtrr,et al.  FINDING A MAXIMUM CUT OF A PLANAR GRAPH IN POLYNOMIAL TIME * , 2022 .

[112]  D. Sontag 1 Introduction to Dual Decomposition for Inference , 2010 .

[113]  David A. Smith,et al.  Dependency Parsing by Belief Propagation , 2008, EMNLP.

[114]  Martin C. Cooper,et al.  Hybrid tractability of valued constraint problems , 2010, Artif. Intell..

[115]  Giorgio Satta,et al.  On the Complexity of Non-Projective Data-Driven Dependency Parsing , 2007, IWPT.

[116]  Gwen Littlewort,et al.  Automatic Recognition of Facial Actions in Spontaneous Expressions , 2006, J. Multim..

[117]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[118]  Bingsheng He,et al.  On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..

[119]  David A. McAllester,et al.  Generalization bounds and consistency for latent-structural probit and ramp loss , 2011, MLSLP.

[120]  Michail I. Schlesinger,et al.  Stop Condition for Subgradient Minimization in Dual Relaxed (max, +) Problem , 2011, EMMCVPR.

[121]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[122]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[123]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[124]  Fernando De la Torre,et al.  Dynamic cascades with bidirectional bootstrapping for spontaneous facial action unit detection , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[125]  Hanif D. Sherali,et al.  Linear Programming and Network Flows , 1977 .

[126]  Jun'ichi Tsujii,et al.  Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition , 2006, ACL.

[127]  David Burshtein Iterative approximate linear programming decoding of LDPC codes with linear complexity , 2009, IEEE Trans. Inf. Theory.

[128]  Martin Grohe The complexity of homomorphism and constraint satisfaction problems seen from the other side , 2007, JACM.

[129]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[130]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[131]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[132]  Vladimir Kolmogorov,et al.  Submodularity on a Tree: Unifying $L^\natural$ -Convex and Bisubmodular Functions , 2010, MFCS.

[133]  Noah A. Smith,et al.  An Exact Dual Decomposition Algorithm for Shallow Semantic Parsing with Constraints , 2012, *SEMEVAL.

[134]  Martin C. Cooper,et al.  Soft arc consistency revisited , 2010, Artif. Intell..

[135]  Thore Graepel,et al.  Modelling Uncertainty in the Game of Go , 2004, NIPS.

[136]  Hanif D. Sherali,et al.  A Hierarchy of Relaxations Between the Continuous and Convex Hull Representations for Zero-One Programming Problems , 1990, SIAM J. Discret. Math..

[137]  Peter Jeavons,et al.  Classifying the Complexity of Constraints Using Finite Algebras , 2005, SIAM J. Comput..

[138]  Xavier Carreras,et al.  Experiments with a Higher-Order Projective Dependency Parser , 2007, EMNLP.

[139]  Ian McGraw,et al.  Residual Belief Propagation: Informed Scheduling for Asynchronous Message Passing , 2006, UAI.

[140]  Noah A. Smith,et al.  Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers , 2013, ACL.

[141]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[142]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[143]  Alexander Zien,et al.  Transductive support vector machines for structured variables , 2007, ICML '07.

[144]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[145]  Tommi S. Jaakkola,et al.  Tightening LP Relaxations for MAP using Message Passing , 2008, UAI.

[146]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[147]  D. F. Andrews,et al.  Scale Mixtures of Normal Distributions , 1974 .

[148]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[149]  Daniel Prusa,et al.  Universality of the Local Marginal Polytope , 2013, CVPR.

[150]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[151]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[152]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[153]  Ben Taskar,et al.  Structured Prediction, Dual Extragradient and Bregman Projections , 2006, J. Mach. Learn. Res..

[154]  Ambuj Tewari,et al.  On the Nonasymptotic Convergence of Cyclic Coordinate Descent Methods , 2013, SIAM J. Optim..

[155]  C. Michelot A finite algorithm for finding the projection of a point onto the canonical simplex of ∝n , 1986 .

[156]  Martin C. Cooper,et al.  Generalizing constraint satisfaction on trees: Hybrid tractability and variable elimination , 2010, Artif. Intell..

[157]  Stanislav Zivny,et al.  The Complexity of Finite-Valued CSPs , 2016, J. ACM.

[158]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[159]  William T. Freeman,et al.  On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[160]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[161]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[162]  Gene H. Golub,et al.  Matrix computations , 1983 .

[163]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[164]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[165]  Michael I. Jordan Graphical Models , 1998 .

[166]  Matthew Brand,et al.  Discovery and Segmentation of Activities in Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[167]  Maja Pantic,et al.  Combined Support Vector Machines and Hidden Markov Models for Modeling Facial Action Temporal Dynamics , 2007, ICCV-HCI.

[168]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[169]  Richard E. Ladner,et al.  On the Structure of Polynomial Time Reducibility , 1975, JACM.

[170]  Qiang Ji,et al.  Facial Action Unit Recognition by Exploiting Their Dynamic and Semantic Relationships , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[171]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[172]  Stanislav Zivny,et al.  The complexity of finite-valued CSPs , 2013, STOC '13.

[173]  Tomás Werner,et al.  A Linear Programming Approach to Max-Sum Problem: A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[174]  Andreas Vlachos,et al.  A stopping criterion for active learning , 2008, Computer Speech and Language.

[175]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[176]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[177]  Georg Gottlob,et al.  A Comparison of Structural CSP Decomposition Methods , 1999, IJCAI.

[178]  Subhransu Maji,et al.  On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations , 2013, NIPS.

[179]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[180]  Martin C. Cooper,et al.  Generalising submodularity and horn clauses: Tractable optimization problems defined by tournament pair multimorphisms , 2008, Theor. Comput. Sci..

[181]  Petros Maragos,et al.  Image inpainting with a wavelet domain Hidden Markov tree model , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[182]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[183]  Alexander M. Rush,et al.  A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing , 2012, J. Artif. Intell. Res..

[184]  Xiaobo Zhou,et al.  Multiple Nuclei Tracking Using Integer Programming for Quantitative Cancer Cell Cycle Analysis , 2010, IEEE Transactions on Medical Imaging.

[185]  Joseph Naor,et al.  A Linear Programming Formulation and Approximation Algorithms for the Metric Labeling Problem , 2005, SIAM J. Discret. Math..

[186]  David J. Kriegman,et al.  Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[187]  Gökhan BakIr,et al.  Generalization Bounds and Consistency for Structured Labeling , 2007 .

[188]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[189]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[190]  Eric P. Xing,et al.  An Augmented Lagrangian Approach to Constrained MAP Inference , 2011, ICML.

[191]  Vladimir Kolmogorov,et al.  Minimizing Nonsubmodular Functions with Graph Cuts-A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[192]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[193]  Stark C. Draper,et al.  Decomposition methods for large scale LP decoding , 2011, Allerton.

[194]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[195]  Nikos Komodakis,et al.  MRF Energy Minimization and Beyond via Dual Decomposition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[196]  Thierry Artières,et al.  Regularized bundle methods for convex and non-convex risks , 2012, J. Mach. Learn. Res..

[197]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[198]  Mona Singh,et al.  Solving and analyzing side-chain positioning problems using linear and integer programming , 2005, Bioinform..

[199]  Christoph Schnörr,et al.  Evaluation of a First-Order Primal-Dual Algorithm for MRF Energy Minimization , 2011, EMMCVPR.

[200]  David A. McAllester Some PAC-Bayesian theorems , 1998, COLT' 98.

[201]  Tomás Werner,et al.  Revisiting the Linear Programming Relaxation Approach to Gibbs Energy Minimization and Weighted Constraint Satisfaction , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[202]  Endre Boros,et al.  Pseudo-Boolean optimization , 2002, Discret. Appl. Math..

[203]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[204]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[205]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[206]  Andrei A. Krokhin,et al.  Maximizing Supermodular Functions on Product Lattices, with Application to Maximum Constraint Satisfaction , 2008, SIAM J. Discret. Math..

[207]  Arie M. C. A. Koster,et al.  The partial constraint satisfaction problem: Facets and lifting theorems , 1998, Oper. Res. Lett..

[208]  Harvey J. Everett Generalized Lagrange Multiplier Method for Solving Problems of Optimum Allocation of Resources , 1963 .

[209]  Thomas Schiex,et al.  Valued Constraint Satisfaction Problems: Hard and Easy Problems , 1995, IJCAI.

[210]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[211]  Stanislav Zivny,et al.  On Minimal Weighted Clones , 2011, CP.

[212]  Demetri Terzopoulos,et al.  The Computation of Visible-Surface Representations , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[213]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[214]  Tom Heskes,et al.  Efficient Bayesian multivariate fMRI analysis using a sparsifying spatio-temporal prior , 2010, NeuroImage.

[215]  Max Welling,et al.  Herding dynamical weights to learn , 2009, ICML '09.

[216]  Martin J. Wainwright,et al.  Embedded trees: estimation of Gaussian Processes on graphs with cycles , 2004, IEEE Transactions on Signal Processing.

[217]  Vladimir Kolmogorov,et al.  The complexity of conservative valued CSPs , 2011, JACM.

[218]  Qiang Fu,et al.  Bethe-ADMM for Tree Decomposition based Parallel MAP Inference , 2013, UAI.

[219]  Martial Hebert,et al.  Modeling the Temporal Extent of Actions , 2010, ECCV.

[220]  Peter Jonsson,et al.  Min CSP on Four Elements: Moving beyond Submodularity , 2011, CP.

[221]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[222]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[223]  Ryan P. Adams,et al.  Randomized Optimum Models for Structured Prediction , 2012, AISTATS.

[224]  Martin C. Cooper,et al.  An Algebraic Theory of Complexity for Discrete Optimization , 2012, SIAM J. Comput..

[225]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[226]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[227]  Nikolas P. Galatsanos,et al.  Variational Bayesian Image Restoration With a Product of Spatially Weighted Total Variation Image Priors , 2010, IEEE Transactions on Image Processing.

[228]  Aaron F. Bobick,et al.  A state-based technique for the summarization and recognition of gesture , 1995, Proceedings of IEEE International Conference on Computer Vision.

[229]  George Papandreou,et al.  Efficient variational inference in large-scale Bayesian compressed sensing , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[230]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[231]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[232]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[233]  Martin C. Cooper,et al.  The complexity of soft constraint satisfaction , 2006, Artif. Intell..

[234]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[235]  James M. Rehg,et al.  Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems , 2008, International Journal of Computer Vision.

[236]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[237]  Nikos Komodakis,et al.  MRF Optimization via Dual Decomposition: Message-Passing Revisited , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[238]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[239]  Christoph Schnörr,et al.  A bundle approach to efficient MAP-inference by Lagrangian relaxation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[240]  Fernando De la Torre,et al.  Max-Margin Early Event Detectors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[241]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[242]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[243]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[244]  Tommi S. Jaakkola,et al.  Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.

[245]  Lifeng Shang,et al.  Nonparametric discriminant HMM and application to facial expression recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[246]  Mark A. Girolami,et al.  A Variational Method for Learning Sparse and Overcomplete Representations , 2001, Neural Computation.

[247]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[248]  Martin C. Cooper,et al.  Tractable Triangles and Cross-Free Convexity in Discrete Optimisation , 2012, J. Artif. Intell. Res..

[249]  Warren P. Adams,et al.  A hierarchy of relaxation between the continuous and convex hull representations , 1990 .

[250]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[251]  Christoph Schnörr,et al.  A study of Nesterov's scheme for Lagrangian decomposition and MAP labeling , 2011, CVPR 2011.

[252]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[253]  Daniel Tarlow,et al.  Using Combinatorial Optimization within Max-Product Belief Propagation , 2006, NIPS.

[254]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[255]  Mirella Lapata,et al.  WikiSimple: Automatic Simplification of Wikipedia Articles , 2011, AAAI.

[256]  Ryan O'Donnell,et al.  Linear programming, width-1 CSPs, and robust satisfaction , 2012, ITCS '12.

[257]  Eli Shechtman,et al.  Space-Time Behavior-Based Correlation-OR-How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them? , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[258]  Joachim M. Buhmann,et al.  Entropy and Margin Maximization for Structured Output Learning , 2010, ECML/PKDD.

[259]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[260]  Eraldo Rezende Fernandes,et al.  Learning from Partially Annotated Sequences , 2011, ECML/PKDD.

[261]  Matthias W. Seeger,et al.  Fast Convergent Algorithms for Expectation Propagation Approximate Bayesian Inference , 2010, AISTATS.

[262]  Marc Gyssens,et al.  Closure properties of constraints , 1997, JACM.

[263]  Satoru Iwata,et al.  Bisubmodular Function Minimization , 2005, SIAM J. Discret. Math..

[264]  M. Mézard,et al.  Information, Physics, and Computation , 2009 .

[265]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[266]  Tommi S. Jaakkola,et al.  Introduction to dual composition for inference , 2011 .

[267]  Stark C. Draper,et al.  Divide and Concur and Difference-Map BP Decoders for LDPC Codes , 2011, IEEE Transactions on Information Theory.

[268]  Martin J. Wainwright,et al.  Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes , 2010, J. Mach. Learn. Res..

[269]  Matthias W. Seeger,et al.  Large Scale Bayesian Inference and Experimental Design for Sparse Linear Models , 2011, SIAM J. Imaging Sci..

[270]  Gunnar Rätsch,et al.  An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis , 2008, NIPS.

[271]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[272]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[273]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[274]  Ben Taskar,et al.  Learning from Partial Labels , 2011, J. Mach. Learn. Res..

[275]  Hao Zhang,et al.  Generalized Higher-Order Dependency Parsing with Cube Pruning , 2012, EMNLP.

[276]  Daniel Berend,et al.  A Reverse Pinsker Inequality , 2012, ArXiv.

[277]  Eric P. Xing,et al.  Turbo Parsers: Dependency Parsing by Approximate Variational Inference , 2010, EMNLP.

[278]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[279]  Tao Wang,et al.  Semantic Event Detection using Conditional Random Fields , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[280]  Vladimir Kolmogorov,et al.  The Power of Linear Programming for Finite-Valued CSPs: A Constructive Characterization , 2013, ICALP.

[281]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[282]  Stanislav Zivny,et al.  The Complexity of Valued Constraint Satisfaction Problems , 2012, Cognitive Technologies.

[283]  Tom Heskes,et al.  On the Uniqueness of Loopy Belief Propagation Fixed Points , 2004, Neural Computation.

[284]  Sebastian Nowozin,et al.  Discriminative Subsequence Mining for Action Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[285]  Carsten Rother,et al.  Weakly supervised discriminative localization and classification: a joint learning process , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[286]  Michael Patriksson,et al.  Ergodic, primal convergence in dual subgradient schemes for convex programming , 1999, Mathematical programming.

[287]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[288]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[289]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[290]  S. Lai,et al.  Learning partially-observed hidden conditional random fields for facial expression recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[291]  M. Shlezinger Syntactic analysis of two-dimensional visual signals in the presence of noise , 1976 .

[292]  M. J. D. Powell,et al.  A method for nonlinear constraints in minimization problems , 1969 .

[293]  Bogdan Savchynskyy,et al.  Getting Feasible Variable Estimates from Infeasible Ones: MRF Local Polytope Study , 2012, 2013 IEEE International Conference on Computer Vision Workshops.

[294]  Jeremy Jancsary,et al.  Convergent Decomposition Solvers for Tree-reweighted Free Energies , 2011, AISTATS.

[295]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[296]  Andrew McCallum,et al.  Collective Segmentation and Labeling of Distant Entities in Information Extraction , 2004 .

[297]  Dmitry M. Malioutov,et al.  Low-Rank Variance Approximation in GMRF Models: Single and Multiscale Approaches , 2008, IEEE Transactions on Signal Processing.

[298]  Dmitrij Schlesinger,et al.  Exact Solution of Permuted Submodular MinSum Problems , 2007, EMMCVPR.

[299]  Pushmeet Kohli,et al.  Measuring uncertainty in graph cut solutions , 2008, Comput. Vis. Image Underst..

[300]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[301]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[302]  K. Matusita On the notion of affinity of several distributions and some of its applications , 1967 .

[303]  Andrew Gelfand,et al.  Integrating local classifiers through nonlinear dynamics on label graphs with an application to image segmentation , 2011, 2011 International Conference on Computer Vision.

[304]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[305]  Thomas J. Schaefer,et al.  The complexity of satisfiability problems , 1978, STOC.

[306]  Richard S. Zemel,et al.  HOP-MAP: Efficient Message Passing with High Order Potentials , 2010, AISTATS.

[307]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[308]  John D. Lafferty,et al.  Semi-supervised learning using randomized mincuts , 2004, ICML.

[309]  Hilary Buxton,et al.  Comparison of Feedforward (TDRBF) and Generative (TDRGBN) Network for Gesture Based Control , 2001, Gesture Workshop.

[310]  F. Hadlock,et al.  Finding a Maximum Cut of a Planar Graph in Polynomial Time , 1975, SIAM J. Comput..

[311]  Brigham Anderson,et al.  Active learning for Hidden Markov Models: objective functions and algorithms , 2005, ICML.

[312]  Gunnar Rätsch,et al.  Active Learning in the Drug Discovery Process , 2001, NIPS.

[313]  Vladimir Kolmogorov,et al.  The Power of Linear Programming for General-Valued CSPs , 2013, SIAM J. Comput..

[314]  Vladimir Kolmogorov,et al.  Interactive Foreground Extraction using graph cut , 2011 .

[315]  Geir Dahl,et al.  Lagrangian-based methods for finding MAP solutions for MRF models , 2000, IEEE Trans. Image Process..

[316]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[317]  Tamir Hazan,et al.  Norm-Product Belief Propagation: Primal-Dual Message-Passing for Approximate Inference , 2009, IEEE Transactions on Information Theory.

[318]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[319]  Li Wang,et al.  Discriminative human action segmentation and recognition using semi-Markov model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[320]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[321]  William T. Freeman,et al.  Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology , 1999, Neural Computation.

[322]  Tommi S. Jaakkola,et al.  Convergence Rate Analysis of MAP Coordinate Minimization Algorithms , 2012, NIPS.

[323]  Dan Klein,et al.  Jointly Learning to Extract and Compress , 2011, ACL.

[324]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[325]  Alexander M. Rush,et al.  Dual Decomposition for Parsing with Non-Projective Head Automata , 2010, EMNLP.