Sparse and Continuous Attention Mechanisms
暂无分享,去创建一个
André F. T. Martins | Marcos Vinícius Treviso | M'ario A. T. Figueiredo | P. Aguiar | Vlad Niculae | António Farinhas
[1] H. Jeffreys,et al. Theory of probability , 1896 .
[2] B. O. Koopman. On distributions admitting a sufficient statistic , 1936 .
[3] E. Pitman,et al. Sufficient statistics and intrinsic accuracy , 1936, Mathematical Proceedings of the Cambridge Philosophical Society.
[4] E. Jaynes. Information Theory and Statistical Mechanics , 1957 .
[5] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .
[6] Jan Havrda,et al. Quantification method of classification processes. Concept of structural a-entropy , 1967, Kybernetika.
[7] V. A. Epanechnikov. Non-Parametric Estimation of a Multivariate Probability Density , 1969 .
[8] L. Brown. Fundamentals of statistical exponential families: with applications in statistical decision theory , 1986 .
[9] C. Tsallis. Possible generalization of Boltzmann-Gibbs statistics , 1988 .
[10] John Scott Bridle,et al. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.
[11] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[12] Yuichi Nakamura,et al. Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.
[13] Mário A. T. Figueiredo. Adaptive Sparseness Using Jeffreys Prior , 2001, NIPS.
[14] George Eastman House,et al. Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .
[15] Sumiyoshi Abe. Geometry of escort distributions. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.
[16] L. Jost. Entropy and diversity , 2006 .
[17] J. Naudts. The q-exponential family in statistical physics , 2008, 0809.4764.
[18] S. V. N. Vishwanathan,et al. T-logistic Regression , 2010, NIPS.
[19] Timothy D. Sears. Generalized Maximum Entropy, Convexity and Machine Learning , 2010 .
[20] Shun-ichi Amari,et al. Geometry of q-Exponential Family of Probability Distributions , 2011, Entropy.
[21] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[22] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.
[23] A. Ohara,et al. Geometry for q-exponential families , 2011 .
[24] F. Opitz. Information geometry and its applications , 2012, 2012 9th European Radar Conference.
[25] O. Barndorff-Nielsen. Information and Exponential Families in Statistical Theory , 1980 .
[26] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[27] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[28] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.
[29] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[30] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[31] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Ramón Fernández Astudillo,et al. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.
[34] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[35] Klaus-Robert Müller,et al. SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.
[36] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Mauro Cettolo,et al. Overview of the IWSLT 2017 Evaluation Campaign , 2017, IWSLT.
[38] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.
[39] Raquel Urtasun,et al. Deep Parametric Continuous Convolutional Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[40] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[41] Noah A. Smith,et al. Is Attention Interpretable? , 2019, ACL.
[42] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[43] André F. T. Martins,et al. Sparse Sequence-to-Sequence Models , 2019, ACL.
[44] Yulia Tsvetkov,et al. Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs , 2018, ICLR.
[45] Jieyu Zhao,et al. Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[46] Stefan Riezler,et al. Joey NMT: A Minimalist NMT Toolkit for Novices , 2019, EMNLP.
[47] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Yuval Pinter,et al. Attention is not not Explanation , 2019, EMNLP.
[49] David Duvenaud,et al. Latent Ordinary Differential Equations for Irregularly-Sampled Time Series , 2019, NeurIPS.
[50] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[51] André F. T. Martins,et al. Adaptively Sparse Transformers , 2019, EMNLP.
[52] Mohit Iyyer,et al. Hard-Coded Gaussian Attention for Neural Machine Translation , 2020, ACL.
[53] Martin Jaggi,et al. On the Relationship between Self-Attention and Convolutional Layers , 2019, ICLR.
[54] André F. T. Martins,et al. Learning with Fenchel-Young Losses , 2019, J. Mach. Learn. Res..