Localized Structured Prediction

Key to structured prediction is exploiting the problem structure to simplify the learning process. A major challenge arises when data exhibit a local structure (e.g., are made by "parts") that can be leveraged to better approximate the relation between (parts of) the input and (parts of) the output. Recent literature on signal processing, and in particular computer vision, has shown that capturing these aspects is indeed essential to achieve state-of-the-art performance. While such algorithms are typically derived on a case-by-case basis, in this work we propose the first theoretical framework to deal with part-based data from a general perspective. We derive a novel approach to deal with these problems and study its generalization properties within the setting of statistical learning theory. Our analysis is novel in that it explicitly quantifies the benefits of leveraging the part-based structure of the problem with respect to the learning rates of the proposed estimator.

[1]  Massimiliano Pontil,et al.  Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction , 2019, ICML.

[2]  Lorenzo Rosasco,et al.  Consistent Multitask Learning with Nonlinear Output Relations , 2017, NIPS.

[3]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Lorenzo Rosasco,et al.  Manifold Structured Prediction , 2018, NeurIPS.

[5]  Lorenzo Rosasco,et al.  A Consistent Regularization Approach for Structured Prediction , 2016, NIPS.

[6]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[7]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[8]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[9]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[11]  Arthur Gretton,et al.  Kernel Instrumental Variable Regression , 2019, NeurIPS.

[12]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[13]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[14]  Le Song,et al.  A unified kernel framework for nonparametric inference in graphical models ] Kernel Embeddings of Conditional Distributions , 2013 .

[15]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[16]  Gustavo Camps-Valls,et al.  Structured output SVM for remote sensing image classification , 2009 .

[17]  Mehryar Mohri,et al.  Structured Prediction Theory Based on Factor Graph Complexity , 2016, NIPS.

[18]  Kim C. Border,et al.  Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[19]  Philippe Preux,et al.  A Generalized Kernel Approach to Structured Output Learning , 2013, ICML.

[20]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[21]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[22]  NowozinSebastian,et al.  Structured Learning and Prediction in Computer Vision , 2011 .

[23]  Thomas Hofmann,et al.  Predicting structured objects with support vector machines , 2009, Commun. ACM.

[24]  Lorenzo Rosasco,et al.  FALKON: An Optimal Large Scale Kernel Method , 2017, NIPS.

[25]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[26]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[28]  Mehryar Mohri,et al.  Ensemble Methods for Structured Prediction , 2014, ICML.

[29]  Michael Collins,et al.  Parameter Estimation for Statistical Parsing Models: Theory and Practice of , 2001, IWPT.

[30]  V. V. Yurinskii Exponential inequalities for sums of random vectors , 1976 .

[31]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[32]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[33]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[34]  Francis R. Bach,et al.  On Structured Prediction Theory with Calibrated Convex Surrogate Losses , 2017, NIPS.

[35]  Francis R. Bach,et al.  Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..

[36]  Alessandro Rudi,et al.  Sharp Analysis of Learning with Discrete Losses , 2019, AISTATS.

[37]  Charles A. Micchelli,et al.  Kernels for Multi--task Learning , 2004, NIPS.

[38]  Bernhard Schölkopf,et al.  Nonparametric Regression between General Riemannian Manifolds , 2010, SIAM J. Imaging Sci..

[39]  Alessandro Rudi,et al.  Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance , 2018, NeurIPS.

[40]  Maxime Sangnier,et al.  Output Fisher embedding regression , 2018, Machine Learning.

[41]  Andrew Zisserman,et al.  Structured output regression for detection with partial truncation , 2009, NIPS.

[42]  Florence d'Alché-Buc,et al.  A Structured Prediction Approach for Label Ranking , 2018, NeurIPS.

[43]  Anton Osokin,et al.  Quantifying Learning Guarantees for Convex but Inconsistent Surrogates , 2018, NeurIPS.

[44]  Guy Lever,et al.  Conditional mean embeddings as regressors , 2012, ICML.

[45]  Pushmeet Kohli,et al.  Reduce, reuse & recycle: Efficiently solving multi-label MRFs , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[47]  I. Pinelis OPTIMUM BOUNDS FOR THE DISTRIBUTIONS OF MARTINGALES IN BANACH SPACES , 1994, 1208.2200.

[48]  Michael I. Jordan,et al.  On the Consistency of Ranking Algorithms , 2010, ICML.

[49]  Christoph H. Lampert,et al.  Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Derek Hoiem,et al.  Learning CRFs Using Graph Cuts , 2008, ECCV.