A Smoother Way to Train Structured Prediction Models
暂无分享,去创建一个
Zaïd Harchaoui | Sham M. Kakade | Venkata Krishna Pillutla | Vincent Roulet | S. Kakade | Z. Harchaoui | Krishna Pillutla | Vincent Roulet | Zaïd Harchaoui
[1] Alexander J. Smola,et al. Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..
[2] Koen E. A. van de Sande,et al. Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.
[3] Tommi S. Jaakkola,et al. Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.
[4] J. Andrew Bagnell,et al. (Approximate) Subgradient Methods for Structured Prediction , 2007, International Conference on Artificial Intelligence and Statistics.
[5] Zaïd Harchaoui,et al. On learning to localize objects with minimal supervision , 2014, ICML.
[6] Rina Dechter,et al. Searching for the M Best Solutions in Graphical Models , 2016, J. Artif. Intell. Res..
[7] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[8] Arthur Mensch,et al. Differentiable Dynamic Programming for Structured Prediction and Attention , 2018, ICML.
[9] Thomas Hofmann,et al. Hidden Markov Support Vector Machines , 2003, ICML.
[10] Thomas Deselaers,et al. Localizing Objects While Learning Their Appearance , 2010, ECCV.
[11] James V. Burke,et al. Descent methods for composite nondifferentiable optimization problems , 1985, Math. Program..
[12] Aaron Defazio,et al. A Simple Practical Accelerated Method for Finite Sums , 2016, NIPS.
[13] Aurélien Lucchi,et al. Variance Reduced Stochastic Gradient Descent with Neighbors , 2015, NIPS.
[14] Christoph Schnörr,et al. A study of Nesterov's scheme for Lagrangian decomposition and MAP labeling , 2011, CVPR 2011.
[15] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[16] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.
[17] Sham M. Kakade,et al. Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.
[18] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .
[19] Zaïd Harchaoui,et al. Catalyst for Gradient-based Nonconvex Optimization , 2018, AISTATS.
[20] D. Greig,et al. Exact Maximum A Posteriori Estimation for Binary Images , 1989 .
[21] Stephen Gould,et al. Accelerated dual decomposition for MAP inference , 2010, ICML.
[22] Anton Osokin,et al. Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs , 2016, ICML.
[23] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.
[24] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .
[25] Pushmeet Kohli,et al. Measuring uncertainty in graph cut solutions , 2008, Comput. Vis. Image Underst..
[26] Christoph H. Lampert,et al. Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[27] Jung-Fu Cheng,et al. Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..
[28] Ben Taskar,et al. Structured Prediction, Dual Extragradient and Bregman Projections , 2006, J. Mach. Learn. Res..
[29] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.
[30] Dmitriy Drusvyatskiy,et al. Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..
[31] Amir Globerson,et al. An LP View of the M-best MAP problem , 2009, NIPS.
[32] Tamir Hazan,et al. Blending Learning and Inference in Conditional Random Fields , 2016, J. Mach. Learn. Res..
[33] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Math. Program..
[34] Vladimir Kolmogorov,et al. What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[35] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[36] Thomas Hofmann,et al. Support vector machine learning for interdependent and structured output spaces , 2004, ICML.
[37] Mark Jerrum,et al. Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..
[38] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[39] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[40] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.
[41] Mark Steedman,et al. A* CCG Parsing with a Supertag-factored Model , 2014, EMNLP.
[42] Maksim Tkatchenko,et al. Named entity recognition: Exploring features , 2012, KONVENS.
[43] Arkadi Nemirovski,et al. Dual subgradient algorithms for large-scale nonsmooth learning problems , 2013, Math. Program..
[44] Xinhua Zhang,et al. Accelerated training of max-margin Markov networks with kernels , 2011, Theor. Comput. Sci..
[45] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..
[46] Michael I. Jordan,et al. Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.
[47] Mark W. Schmidt,et al. Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.
[48] Peter L. Bartlett,et al. Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks , 2008, J. Mach. Learn. Res..
[49] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[50] Christoph H. Lampert,et al. Computing the M Most Probable Modes of a Graphical Model , 2013, AISTATS.
[51] László Lovász,et al. Submodular functions and convexity , 1982, ISMP.
[52] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.
[53] R. Bellman. Dynamic programming. , 1957, Science.
[54] A. P. Dawid,et al. Applications of a general propagation algorithm for probabilistic expert systems , 1992 .
[55] Patrick Gallinari,et al. A Framework for the Cooperation of Learning Algorithms , 1990, NIPS.
[56] Zaïd Harchaoui,et al. Semi-Proximal Mirror-Prox for Nonsmooth Composite Minimization , 2015, NIPS.
[57] Tamir Hazan,et al. A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction , 2010, NIPS.
[58] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..
[59] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[60] Francis R. Bach,et al. Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.
[61] Martin J. Wainwright,et al. MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.
[62] Davi Geiger,et al. Segmentation by grouping junctions , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).
[63] Jason K. Johnson,et al. Convex relaxation methods for graphical models: Lagrangian and maximum entropy approaches , 2008 .
[64] Allen R. Hanson,et al. Maximum-weight bipartite matching technique and its application in image feature matching , 1996, Other Conferences.
[65] Gregory F. Cooper,et al. The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..
[66] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..
[67] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[68] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.
[69] Ramón Fernández Astudillo,et al. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.
[70] Y. Weiss,et al. Finding the M Most Probable Configurations using Loopy Belief Propagation , 2003, NIPS 2003.
[71] Mark W. Schmidt,et al. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.
[72] Yurii Nesterov,et al. Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..
[73] D. Nilsson,et al. An efficient algorithm for finding the M most probable configurationsin probabilistic expert systems , 1998, Stat. Comput..
[74] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[75] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[76] Tommi S. Jaakkola,et al. Convergence Rate Analysis of MAP Coordinate Minimization Algorithms , 2012, NIPS.
[77] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[78] Dhruv Batra,et al. An Efficient Message-Passing Algorithm for the M-Best MAP Problem , 2012, UAI.
[79] Andrew McCallum,et al. Structured Prediction Energy Networks , 2015, ICML.
[80] Philip Wolfe,et al. Validation of subgradient optimization , 1974, Math. Program..
[81] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[82] James H. Martin,et al. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.
[83] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[84] Daniel Marcu,et al. Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.
[85] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..
[86] Jean-Louis Golmard,et al. An algorithm directly finding the K most probable configurations in Bayesian networks , 1994, Int. J. Approx. Reason..
[87] Dmitriy Drusvyatskiy,et al. Efficiency of minimizing compositions of convex functions and smooth maps , 2016, Math. Program..
[88] Mark W. Schmidt,et al. Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields , 2015, AISTATS.
[89] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[90] Nathan Srebro,et al. Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.
[91] Claire Cardie,et al. SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.
[92] Daniel Tarlow,et al. Using Combinatorial Optimization within Max-Product Belief Propagation , 2006, NIPS.
[93] Yoshua Bengio,et al. LeRec: A NN/HMM Hybrid for On-Line Handwriting Recognition , 1995, Neural Computation.
[94] Luke S. Zettlemoyer,et al. Deep Semantic Role Labeling: What Works and What’s Next , 2017, ACL.
[95] Julien Mairal,et al. Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..
[96] Gregory Shakhnarovich,et al. Diverse M-Best Solutions in Markov Random Fields , 2012, ECCV.
[97] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets , 2012, ArXiv.
[98] Yoshua Bengio,et al. Global training of document processing systems using graph transformer networks , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[99] Zaïd Harchaoui,et al. Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice , 2017, J. Mach. Learn. Res..
[100] Ben Taskar,et al. A Discriminative Matching Approach to Word Alignment , 2005, HLT.
[101] Alexander Schrijver,et al. Combinatorial optimization. Polyhedra and efficiency. , 2003 .
[102] Marc Teboulle,et al. Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..