Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms
暂无分享,去创建一个
[1] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[2] Eric P. Xing,et al. Nonextensive Information Theoretic Kernels on Measures , 2009, J. Mach. Learn. Res..
[3] J. Borwein,et al. Convex Analysis And Nonlinear Optimization , 2000 .
[4] W. Berger,et al. Diversity of Planktonic Foraminifera in Deep-Sea Sediments , 1970, Science.
[5] 丸山 徹. Convex Analysisの二,三の進展について , 1977 .
[6] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.
[7] J. F. C. Kingman,et al. Information and Exponential Families in Statistical Theory , 1980 .
[8] Claude E. Shannon,et al. A mathematical theory of communication , 1948, MOCO.
[9] Mark D. Reid,et al. Composite Multiclass Losses , 2011, J. Mach. Learn. Res..
[10] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.
[11] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[12] Vlad Niculae,et al. A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.
[13] Marc Teboulle,et al. Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..
[14] Hamed Masnadi-Shirazi. The design of Bayes consistent loss functions for classification , 2011 .
[15] Arthur Mensch,et al. Differentiable Dynamic Programming for Structured Prediction and Attention , 2018, ICML.
[16] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .
[17] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..
[18] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.
[19] I JordanMichael,et al. Graphical Models, Exponential Families, and Variational Inference , 2008 .
[20] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.
[21] Alexandre M. Bayen,et al. Efficient Bregman projections onto the simplex , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).
[22] O. Mangasarian. PSEUDO-CONVEX FUNCTIONS , 1965 .
[23] Alexander J. Smola,et al. Learning with kernels , 1998 .
[24] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..
[25] A. Dawid,et al. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.
[26] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.
[27] Ramón Fernández Astudillo,et al. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.
[28] C. Tsallis. Possible generalization of Boltzmann-Gibbs statistics , 1988 .
[29] C. Tsallis,et al. Nonextensive Entropy: Interdisciplinary Applications , 2004 .
[30] J. Danskin. The Theory of Max-Min, with Applications , 1966 .
[31] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..
[32] G. Brier. VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .
[33] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..
[34] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .
[35] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[36] Laurent Condat. Fast projection onto the simplex and the l1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pmb {l}_\mathbf {1}$$\end{ , 2015, Mathematical Programming.
[37] P. Brucker. Review of recent development: An O( n) algorithm for quadratic knapsack problems , 1984 .
[38] F. Opitz. Information geometry and its applications , 2012, 2012 9th European Radar Conference.
[39] A. Raftery,et al. Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .
[40] K. Ball,et al. Sharp uniform convexity and smoothness inequalities for trace norms , 1994 .
[41] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.
[42] Claire Cardie,et al. SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.
[43] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[44] Yann Guermeur,et al. VC Theory of Large Margin Multi-Category Classifiers , 2007, J. Mach. Learn. Res..
[45] Stephen J. Wright,et al. Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..
[46] L. J. Savage. Elicitation of Personal Probabilities and Expectations , 1971 .
[47] Manfred K. Warmuth,et al. Two-temperature logistic regression based on the Tsallis divergence , 2017, AISTATS.
[48] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[49] Hiroki Suyari. Generalization of Shannon-Khinchin axioms to nonextensive systems and the uniqueness theorem for the nonextensive entropy , 2004, IEEE Transactions on Information Theory.
[50] C. Gini. Variabilita e Mutabilita. , 1913 .
[51] G. Crooks. On Measures of Entropy and Information , 2015 .
[52] André F. T. Martins,et al. Learning with Fenchel-Young Losses , 2020, J. Mach. Learn. Res..
[53] M. Degroot. Uncertainty, Information, and Sequential Experiments , 1962 .
[54] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[55] Mark D. Reid,et al. Composite Binary Losses , 2009, J. Mach. Learn. Res..
[56] A. Buja,et al. Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .
[57] Frank Nielsen,et al. Bregman Divergences and Surrogates for Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.