Exploring the Role of Loss Functions in Multiclass Classification
暂无分享,去创建一个
[1] Ankit Singh Rawat,et al. Sampled Softmax with Random Fourier Features , 2019, NeurIPS.
[2] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.
[3] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[4] Mikhail Belkin,et al. Does data interpolation contradict statistical optimality? , 2018, AISTATS.
[5] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[8] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[9] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[10] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[11] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[12] Samet Oymak,et al. Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? , 2018, ICML.
[13] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[14] Douglas Kline,et al. Revisiting squared-error and cross-entropy functions for training neural network classifiers , 2005, Neural Computing & Applications.
[15] PerronninFlorent,et al. Good Practice in Large-Scale Learning for Image Classification , 2014 .
[16] Samet Oymak,et al. Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian , 2019, ArXiv.
[17] Mikhail Belkin,et al. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.
[18] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[19] Ambuj Tewari,et al. On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..
[20] Pradeep Ravikumar,et al. PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification , 2016, ICML.
[21] Andries Petrus Engelbrecht,et al. Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions , 2019, Neurocomputing.
[22] Hermann Ney,et al. Cross-entropy vs. squared error training: a theoretical and experimental comparison , 2013, INTERSPEECH.
[23] John G. Proakis,et al. Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..
[24] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[25] A. Choromańska. Extreme Multi Class Classification , 2013 .
[26] P. S. Sastry,et al. Robust Loss Functions for Learning Multi-class Classifiers , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).
[27] Krzysztof Gajowniczek,et al. Generalized Entropy Cost Function in Neural Networks , 2017, ICANN.
[28] Dimitris Samaras,et al. Squared Earth Mover's Distance-based Loss for Training Deep Neural Networks , 2016, ArXiv.