Deep Gamblers: Learning to Abstain with Portfolio Theory
暂无分享,去创建一个
Ruslan Salakhutdinov | Louis-Philippe Morency | Paul Pu Liang | Ziyin Liu | Zhikang Wang | Masahito Ueda | Zhikang T. Wang | R. Salakhutdinov | Louis-Philippe Morency | Masahito Ueda | Liu Ziyin | P. Liang
[1] Yarin Gal,et al. Uncertainty in Deep Learning , 2016 .
[2] Graham Neubig,et al. Neural Machine Translation and Sequence-to-sequence Models: A Tutorial , 2017, ArXiv.
[3] Yoshua Bengio,et al. A Walk with SGD , 2018, ArXiv.
[4] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[5] John Langford,et al. A comparison of tight generalization error bounds , 2005, ICML '05.
[6] Hyun-Chul Kim,et al. Support Vector Machine Ensemble with Bagging , 2002, SVM.
[7] Abhinav Vishnu,et al. Deep learning for computational chemistry , 2017, J. Comput. Chem..
[8] C. K. Chow,et al. An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..
[9] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[10] Zhi-Hua Zhou,et al. Ensemble Methods: Foundations and Algorithms , 2012 .
[11] Ran El-Yaniv,et al. On the Foundations of Noise-free Selective Classification , 2010, J. Mach. Learn. Res..
[12] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.
[13] Tim Pearce,et al. Uncertainty in Neural Networks: Approximately Bayesian Ensembling , 2018, AISTATS.
[14] Jonas Kubilius,et al. Deep Neural Networks as a Computational Model for Human Shape Sensitivity , 2016, PLoS Comput. Biol..
[15] P. Baldi,et al. Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.
[16] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[17] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[18] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[19] Oriol Vinyals,et al. Bayesian Recurrent Neural Networks , 2017, ArXiv.
[20] Yoshua Bengio,et al. Variance Reduction in SGD by Distributed Importance Sampling , 2015, ArXiv.
[21] Richard Piper,et al. An overview of gradient descent optimization algorithms , 2016 .
[22] Daniel Jiwoong Im,et al. An empirical analysis of the optimization of deep network loss surfaces , 2016, 1612.04010.
[23] Lorenzo Rosasco,et al. Theory of Deep Learning III: explaining the non-overfitting puzzle , 2017, ArXiv.
[24] Ran El-Yaniv,et al. Selective Classification for Deep Neural Networks , 2017, NIPS.
[25] Meng Yang,et al. Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.
[26] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[27] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[28] Mohamed Zaki,et al. Uncertainty in Neural Networks: Bayesian Ensembling , 2018, ArXiv.
[29] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[30] Peter L. Bartlett,et al. Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..
[31] Louis-Philippe Morency,et al. Multimodal Language Analysis with Recurrent Multistage Fusion , 2018, EMNLP.
[32] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[33] Yizhou Sun,et al. Learning Fair Representations via an Adversarial Framework , 2019, ArXiv.
[34] Nir Shavit,et al. Deep Learning is Robust to Massive Label Noise , 2017, ArXiv.
[35] L. Elisa Celis,et al. Improved Adversarial Learning for Fair Classification , 2019, ArXiv.
[36] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[37] Andrew Slavin Ross,et al. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.
[38] Zhanxing Zhu,et al. Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes , 2017, ArXiv.
[39] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[40] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Thomas M. Cover,et al. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .
[42] Yoshua Bengio,et al. Dendritic cortical microcircuits approximate the backpropagation algorithm , 2018, NeurIPS.
[43] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[44] Toniann Pitassi,et al. Learning Fair Representations , 2013, ICML.
[45] Barnabás Póczos,et al. Cautious Deep Learning , 2018, ArXiv.
[46] Kristina Machova,et al. A Bagging Method using Decision Trees in the Role of Base Classifiers , 2006 .
[47] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[48] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[49] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[50] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[51] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.
[52] Y. Mansour,et al. Generalization bounds for averaged classifiers , 2004, math/0410092.
[53] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[54] Samy Bengio,et al. Identity Crisis: Memorization and Generalization under Extreme Overparameterization , 2019, ICLR.
[55] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.
[56] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[57] Minoru Yoshida,et al. Evidence for the appearance of atmospheric tau neutrinos in super-Kamiokande. , 2012, Physical review letters.
[58] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[59] Ameet Talwalkar,et al. One-Shot Federated Learning , 2019, ArXiv.
[60] Carlos R. Ponce,et al. Evolving Images for Visual Neurons Using a Deep Generative Network Reveals Coding Principles and Neuronal Preferences , 2019, Cell.
[61] Arash Vahdat,et al. Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks , 2017, NIPS.
[62] Geoffrey E. Hinton. Deep Learning-A Technology With the Potential to Transform Health Care. , 2018, JAMA.
[63] Hossein Mobahi,et al. Large Margin Deep Networks for Classification , 2018, NeurIPS.
[64] Tomaso A. Poggio,et al. Theory of Deep Learning IIb: Optimization Properties of SGD , 2018, ArXiv.
[65] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[66] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[67] Alex Graves,et al. Practical Variational Inference for Neural Networks , 2011, NIPS.
[68] Weihong Deng,et al. Very deep convolutional neural network based image classification using small training sample size , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).
[69] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[70] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[71] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[72] R. Schapire. The Strength of Weak Learnability , 1990, Machine Learning.
[73] Ran El-Yaniv,et al. SelectiveNet: A Deep Neural Network with an Integrated Reject Option , 2019, ICML.
[74] O. Anjos,et al. Neural networks applied to discriminate botanical origin of honeys. , 2015, Food chemistry.
[75] H. Robbins. A Stochastic Approximation Method , 1951 .
[76] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[77] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[78] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.
[79] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[80] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[81] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[82] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[83] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.
[84] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[85] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[86] T. Cover. Universal Portfolios , 1996 .
[87] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.