暂无分享,去创建一个
[1] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[2] Tom Heskes,et al. On Natural Learning and Pruning in Multilayered Perceptrons , 2000, Neural Computation.
[3] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[4] Jimmy Ba,et al. Kronecker-factored Curvature Approximations for Recurrent Neural Networks , 2018, ICLR.
[5] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[6] Xin Dong,et al. Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.
[7] Elman Mansimov,et al. Second-order Optimization for Deep Reinforcement Learning using Kronecker-factored Approximation , 2017, NIPS 2017.
[8] Pascal Vincent,et al. An Evaluation of Fisher Approximations Beyond Kronecker Factorization , 2018, ICLR.
[9] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.
[10] Martin Jaggi,et al. Model Fusion via Optimal Transport , 2019, NeurIPS.
[11] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[12] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.
[13] Yurong Chen,et al. Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.
[14] Frederik Kunstner,et al. Limitations of the empirical Fisher approximation for natural gradient descent , 2019, NeurIPS.
[15] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[17] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[18] Roger B. Grosse,et al. A Kronecker-factored approximate Fisher matrix for convolution layers , 2016, ICML.
[19] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.
[20] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[21] Luke Zettlemoyer,et al. Sparse Networks from Scratch: Faster Training without Losing Performance , 2019, ArXiv.
[22] Lucas Theis,et al. Faster gaze prediction with dense networks and Fisher pruning , 2018, ArXiv.
[23] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.
[24] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.
[25] Sanja Fidler,et al. EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis , 2019, ICML.
[26] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.
[27] Raquel Urtasun,et al. MLPrune: Multi-Layer Pruning for Automated Neural Network Compression , 2018 .
[28] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[29] Yi-Ming Chan,et al. Unifying and Merging Well-trained Deep Neural Networks for Inference Stage , 2018, IJCAI.
[30] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[31] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[32] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[33] Michael C. Mozer,et al. Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.
[34] Martin Jaggi,et al. Dynamic Model Pruning with Feedback , 2020, ICLR.
[35] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[36] Roger B. Grosse,et al. Distributed Second-Order Optimization using Kronecker-Factored Approximations , 2016, ICLR.
[37] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[38] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.
[39] Tao Zhang,et al. A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.
[40] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[41] Olatunji Ruwase,et al. ZeRO: Memory Optimization Towards Training A Trillion Parameter Models , 2019, SC.
[42] Erich Elsen,et al. The State of Sparsity in Deep Neural Networks , 2019, ArXiv.
[43] Jianxin Wu,et al. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[44] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[45] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[46] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[47] Rif A. Saurous,et al. Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks , 2017, ICLR.
[48] Satoshi Matsuoka,et al. Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Naman Agarwal,et al. Second-Order Stochastic Optimization for Machine Learning in Linear Time , 2016, J. Mach. Learn. Res..
[50] Miguel Á. Carreira-Perpiñán,et al. "Learning-Compression" Algorithms for Neural Net Pruning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[51] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Xin Wang,et al. Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization , 2019, ICML.
[53] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..