Differentiable Dynamic Programming for Structured Prediction and Attention
暂无分享,去创建一个
[1] Robert A. Sulanke,et al. OBJECTS COUNTED BY THE CENTRAL DELANNOY NUMBERS , 2003 .
[2] A. Fiacco. A Finite Algorithm for Finding the Projection of a Point onto the Canonical Simplex of R " , 2009 .
[3] L. Baum,et al. Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .
[4] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.
[5] Claire Cardie,et al. SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.
[6] S. Verdú,et al. Abstract dynamic programming models under commutativity conditions , 1987 .
[7] David A. Smith,et al. Minimum Risk Annealing for Training Log-Linear Models , 2006, ACL.
[8] J. Zico Kolter,et al. OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.
[9] D. Bertsekas. Control of uncertain systems with a set-membership description of the uncertainty , 1971 .
[10] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[11] J. Danskin. The Theory of Max-Min, with Applications , 1966 .
[12] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..
[13] Vlad Niculae,et al. A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.
[14] Cyril Banderier,et al. Why Delannoy numbers? , 2004, ArXiv.
[15] Marc Teboulle,et al. Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..
[16] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .
[17] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.
[18] Eduard H. Hovy,et al. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.
[19] Alexander M. Rush,et al. Structured Attention Networks , 2017, ICLR.
[20] Vivien Seguy,et al. Smooth and Sparse Optimal Transport , 2017, AISTATS.
[21] Robert J. McEliece,et al. The generalized distributive law , 2000, IEEE Trans. Inf. Theory.
[22] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[23] Damien Garreau,et al. Metric Learning for Temporal Sequence Alignment , 2014, NIPS.
[24] Andrew McCallum,et al. An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..
[25] Fu Jie Huang,et al. A Tutorial on Energy-Based Learning , 2006 .
[26] Andreas Krause,et al. Differentiable Learning of Submodular Functions , 2017, NIPS.
[27] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[28] Yoshua Bengio,et al. Global training of document processing systems using graph transformer networks , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[29] C. Michelot. A finite algorithm for finding the projection of a point onto the canonical simplex of ∝n , 1986 .
[30] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.
[31] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[32] Veselin Stoyanov,et al. Minimum-Risk Training of Approximate CRF-Based NLP Systems , 2012, NAACL.
[33] Ofer Meshi,et al. Smooth and Strong: MAP Inference with Linear Convergence , 2015, NIPS.
[34] Marco Cuturi,et al. Soft-DTW: a Differentiable Loss Function for Time-Series , 2017, ICML.
[35] J. Moreau. Proximité et dualité dans un espace hilbertien , 1965 .
[36] Ramón Fernández Astudillo,et al. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.
[37] Jason Eisner,et al. Inside-Outside and Forward-Backward Algorithms Are Just Backprop (tutorial paper) , 2016, SPNLP@EMNLP.
[38] Gökhan BakIr,et al. Predicting Structured Data , 2008 .
[39] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.
[40] Bryan Pardo,et al. Soundprism: An Online System for Score-Informed Source Separation of Music Audio , 2011, IEEE Journal of Selected Topics in Signal Processing.
[41] Joan Bruna,et al. Divide and Conquer Networks , 2016, ICLR.
[42] Eszter Gselmann. Entropy functions and functional equations , 2011 .
[43] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.
[44] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..
[45] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[46] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..
[47] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.
[48] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[49] Graham Neubig,et al. A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models , 2017, AAAI.
[50] Matthijs Douze,et al. FastText.zip: Compressing text classification models , 2016, ArXiv.
[51] T. Lindvall. ON A ROUTING PROBLEM , 2004, Probability in the Engineering and Informational Sciences.