暂无分享,去创建一个
[1] F. L. Hitchcock. The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .
[2] C. G. Broyden. A Class of Methods for Solving Nonlinear Simultaneous Equations , 1965 .
[3] Wonyong Sung,et al. Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..
[4] Rongrong Ji,et al. Accelerating Convolutional Networks via Global & Dynamic Filter Pruning , 2018, IJCAI.
[5] Eunhyeok Park,et al. Value-aware Quantization for Training and Inference of Neural Networks , 2018, ECCV.
[6] Giovanna Castellano,et al. An iterative pruning algorithm for feedforward neural networks , 1997, IEEE Trans. Neural Networks.
[7] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.
[8] Asifullah Khan,et al. A survey of the recent architectures of deep convolutional neural networks , 2019, Artificial Intelligence Review.
[9] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.
[10] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.
[11] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.
[12] Ehud D. Karnin,et al. A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.
[13] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] Ebru Arisoy,et al. Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[15] Yue Wang,et al. Drawing early-bird tickets: Towards more efficient training of deep networks , 2019, ICLR.
[16] Adam Gaier,et al. Weight Agnostic Neural Networks , 2019, NeurIPS.
[17] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.
[18] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[19] Mário A. T. Figueiredo,et al. Learning to Share: simultaneous parameter tying and Sparsification in Deep Learning , 2018, ICLR.
[20] Anahita Bhiwandiwalla,et al. Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks , 2020, ICLR.
[21] James T. Kwok,et al. Loss-aware Binarization of Deep Networks , 2016, ICLR.
[22] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[23] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning , 2016, ArXiv.
[24] Xin Wang,et al. Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization , 2019, ICML.
[25] Xianglong Liu,et al. Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] Gerhard Rigoll,et al. Convolutional Neural Networks with Layer Reuse , 2019, 2019 IEEE International Conference on Image Processing (ICIP).
[27] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.
[28] Kyoung Mu Lee,et al. Deeply-Recursive Convolutional Network for Image Super-Resolution , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[30] Phillip Isola,et al. Contrastive Representation Distillation , 2020, ICLR.
[31] Luca Benini,et al. Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.
[32] Kalyanmoy Deb,et al. A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.
[33] Yan Lu,et al. Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Shingo Mabu,et al. Enhancing the generalization ability of neural networks through controlling the hidden layers , 2009, Appl. Soft Comput..
[35] David Kappel,et al. Deep Rewiring: Training very sparse deep networks , 2017, ICLR.
[36] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[37] S. Liberty,et al. Linear Systems , 2010, Scientific Parallel Computing.
[38] David P. Wipf,et al. Compressing Neural Networks using the Variational Information Bottleneck , 2018, ICML.
[39] Lin Xu,et al. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.
[40] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.
[41] Dichao Hu,et al. An Introductory Survey on Attention Mechanisms in NLP Problems , 2018, IntelliSys.
[42] Wee Kheng Leow,et al. Pruned Neural Networks for Regression , 2000, PRICAI.
[43] Michael T. Manry,et al. An integrated growing-pruning method for feedforward network training , 2008, Neurocomputing.
[44] Brian Kingsbury,et al. Knowledge distillation across ensembles of multilingual models for low-resource languages , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Ivan Oseledets,et al. Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..
[46] Russell Reed,et al. Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.
[47] Naftali Tishby,et al. Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.
[48] Dan Alistarh,et al. Model compression via distillation and quantization , 2018, ICLR.
[49] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[50] Yaroslav Bulatov,et al. Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.
[51] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[52] Elad Hoffer,et al. ACIQ: Analytical Clipping for Integer Quantization of neural networks , 2018, ArXiv.
[53] Nikos Komodakis,et al. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.
[54] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[55] Yisong Yue,et al. Long-term Forecasting using Tensor-Train RNNs , 2017, ArXiv.
[56] Yoshua Bengio,et al. Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..
[57] Rohan Ramanath,et al. An Attentive Survey of Attention Models , 2019, ACM Trans. Intell. Syst. Technol..
[58] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[59] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[60] Rui Zhang,et al. KDGAN: Knowledge Distillation with Generative Adversarial Networks , 2018, NeurIPS.
[61] Wei Liu,et al. Neural Compatibility Modeling with Attentive Knowledge Distillation , 2018, SIGIR.
[62] David D. Cox,et al. On the information bottleneck theory of deep learning , 2018, ICLR.
[63] Dipankar Das,et al. Mixed Precision Training With 8-bit Floating Point , 2019, ArXiv.
[64] Graham Neubig,et al. Understanding Knowledge Distillation in Non-autoregressive Machine Translation , 2019, ICLR.
[65] Swagath Venkataramani,et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.
[66] Michael Maire,et al. Learning Implicitly Recurrent CNNs Through Parameter Sharing , 2019, ICLR.
[67] Pradeep Dubey,et al. A Study of BFLOAT16 for Deep Learning Training , 2019, ArXiv.
[68] Tie-Yan Liu,et al. Neural Architecture Optimization , 2018, NeurIPS.
[69] Hang Li,et al. Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.
[70] Vladlen Koltun,et al. Deep Equilibrium Models , 2019, NeurIPS.
[71] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.
[72] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.
[73] Svetlana Lazebnik,et al. Piggyback: Adding Multiple Tasks to a Single, Fixed Network by Learning to Mask , 2018, ArXiv.
[74] Pradeep Dubey,et al. Mixed Precision Training of Convolutional Neural Networks using Integer Operations , 2018, ICLR.
[75] Michael C. Mozer,et al. Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.
[76] Jin Young Choi,et al. Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons , 2018, AAAI.
[77] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[78] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[79] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.
[80] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[81] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[82] Yun Fu,et al. Residual Dense Network for Image Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[83] Yaim Cooper,et al. The loss landscape of overparameterized neural networks , 2018, ArXiv.
[84] Rémi Gribonval,et al. And the Bit Goes Down: Revisiting the Quantization of Neural Networks , 2019, ICLR.
[85] Thad Starner,et al. Data-Free Knowledge Distillation for Deep Neural Networks , 2017, ArXiv.
[86] Sangwook Cho,et al. Understanding Knowledge Distillation , 2020 .
[87] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[88] Andries Petrus Engelbrecht,et al. A new pruning heuristic based on variance analysis of sensitivity information , 2001, IEEE Trans. Neural Networks.
[89] Vincent Lepetit,et al. Learning Separable Filters , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[90] Zhe Gan,et al. Distilling Knowledge Learned in BERT for Text Generation , 2019, ACL.
[91] Vladlen Koltun,et al. Trellis Networks for Sequence Modeling , 2018, ICLR.
[92] Niraj K. Jha,et al. NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm , 2017, IEEE Transactions on Computers.
[93] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[94] Surya Ganguli,et al. Pruning neural networks without any data by iteratively conserving synaptic flow , 2020, NeurIPS.
[95] Yurong Chen,et al. Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[96] Yifan Gong,et al. Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[97] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[98] Ian H. Witten,et al. Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..
[99] Alfred Jean Philippe Lauret,et al. A node pruning algorithm based on a Fourier amplitude sensitivity test method , 2006, IEEE Transactions on Neural Networks.
[100] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[101] Xin Dong,et al. Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.
[102] Max Welling,et al. Bayesian Compression for Deep Learning , 2017, NIPS.
[103] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.
[104] Song Han,et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.
[105] Xingrui Yu,et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.
[106] Dharmendra S. Modha,et al. Deep neural networks are robust to weight binarization and other non-linear distortions , 2016, ArXiv.
[107] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[108] Yifan Gong,et al. Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.
[109] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[110] Max Welling,et al. Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.
[111] Michael W. Mahoney,et al. Exact expressions for double descent and implicit regularization via surrogate random design , 2019, NeurIPS.
[112] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[113] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.
[114] Gintare Karolina Dziugaite,et al. The Lottery Ticket Hypothesis at Scale , 2019, ArXiv.
[115] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[116] Quoc V. Le,et al. Improved Noisy Student Training for Automatic Speech Recognition , 2020, INTERSPEECH.
[117] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.
[118] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.
[119] Luke Zettlemoyer,et al. Sparse Networks from Scratch: Faster Training without Losing Performance , 2019, ArXiv.
[120] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[121] Riccardo Poli,et al. Particle swarm optimization , 1995, Swarm Intelligence.
[122] Torsten Hoefler,et al. Shapeshifter Networks: Cross-layer Parameter Sharing for Scalable and Effective Deep Learning , 2020, ArXiv.
[123] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[124] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[125] Andrew Zisserman,et al. Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.
[126] Christoph H. Lampert,et al. Towards Understanding Knowledge Distillation , 2019, ICML.
[127] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[128] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.
[129] Mingkui Tan,et al. NAT: Neural Architecture Transformer for Accurate and Compact Architectures , 2019, NeurIPS.
[130] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[131] Yann LeCun,et al. Understanding Deep Architectures using a Recursive Convolutional Network , 2013, ICLR.
[132] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[133] Dustin Tran,et al. Image Transformer , 2018, ICML.
[134] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[135] Guokun Lai,et al. RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.
[136] Srinidhi Hegde,et al. Variational Student: Learning Compact and Sparser Networks In Knowledge Distillation Framework , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[137] Lieven De Lathauwer,et al. Decompositions of a Higher-Order Tensor in Block Terms - Part II: Definitions and Uniqueness , 2008, SIAM J. Matrix Anal. Appl..
[138] Soheil Ghiasi,et al. Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[139] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.
[140] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.
[141] Ali Farhadi,et al. What’s Hidden in a Randomly Weighted Neural Network? , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[142] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[143] Jingbo Zhu,et al. Sharing Attention Weights for Fast Transformer , 2019, IJCAI.
[144] Yale Song,et al. Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[145] Geoffrey E. Hinton,et al. When Does Label Smoothing Help? , 2019, NeurIPS.
[146] Geoffrey E. Hinton,et al. Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.
[147] Seyed Iman Mirzadeh,et al. Improved Knowledge Distillation via Teacher Assistant , 2020, AAAI.
[148] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[149] Jiwen Lu,et al. Runtime Neural Pruning , 2017, NIPS.
[150] Vineeth N. Balasubramanian,et al. Deep Model Compression: Distilling Knowledge from Noisy Teachers , 2016, ArXiv.
[151] Yevgen Chebotar,et al. Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition , 2016, INTERSPEECH.
[152] Yiming Hu,et al. A novel channel pruning method for deep neural network compression , 2018, ArXiv.
[153] Nicholas Rhinehart,et al. N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning , 2017, ICLR.
[154] Victor S. Lempitsky,et al. Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[155] L. Darrell Whitley,et al. Genetic algorithms and neural networks: optimizing connections and connectivity , 1990, Parallel Comput..
[156] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[157] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[158] Qiang Liu,et al. On the Margin Theory of Feedforward Neural Networks , 2018, ArXiv.
[159] Christopher A. Walsh,et al. Peter Huttenlocher (1931–2013) , 2013, Nature.
[160] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[161] Kurt Keutzer,et al. HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[162] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[163] Raghuraman Krishnamoorthi,et al. Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.
[164] James G. Scott,et al. The horseshoe estimator for sparse signals , 2010 .
[165] Kilian Q. Weinberger,et al. Feature hashing for large scale multitask learning , 2009, ICML '09.
[166] Hadi Esmaeilzadeh,et al. ReLeQ: An Automatic Reinforcement Learning Approach for Deep Quantization of Neural Networks , 2018 .
[167] Kurt Keutzer,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2020, AAAI.
[168] Zenglin Xu,et al. Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[169] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[170] Hassan Ghasemzadeh,et al. Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher , 2019, ArXiv.
[171] Peter Stone,et al. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science , 2017, Nature Communications.
[172] L. Tucker,et al. Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.
[173] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[174] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.
[175] Xiaodong Liu,et al. Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding , 2019, ArXiv.
[176] Hadi Esmaeilzadeh,et al. ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks , 2018, ArXiv.
[177] Zhe Gan,et al. Distilling the Knowledge of BERT for Text Generation , 2019, ArXiv.
[178] Song Han,et al. Trained Ternary Quantization , 2016, ICLR.
[179] Wei Pan,et al. Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.
[180] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[181] Philip H. S. Torr,et al. SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.
[182] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[183] Nathan Halko,et al. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..
[184] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.
[185] John E. Moody,et al. Fast Pruning Using Principal Components , 1993, NIPS.
[186] Xu Lan,et al. Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.
[187] Greg Mori,et al. Similarity-Preserving Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[188] Masafumi Hagiwara,et al. Removal of hidden units and weights for back propagation networks , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).
[189] Bin Liu,et al. Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[190] Prem Raj Adhikari,et al. Multiresolution Mixture Modeling using Merging of Mixture Components , 2012, ACML.
[191] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[192] Soheil Feizi,et al. Compressing GANs using Knowledge Distillation , 2019, ArXiv.
[193] Michael Carbin,et al. The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.
[194] Junjie Yan,et al. Dynamic Recursive Neural Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[195] Jian Yang,et al. Image Super-Resolution via Deep Recursive Residual Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[196] Jang Hyun Cho,et al. On the Efficacy of Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[197] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[198] Il-Chul Moon,et al. Adversarial Dropout for Supervised and Semi-supervised Learning , 2017, AAAI.
[199] Tim Dettmers,et al. 8-Bit Approximations for Parallelism in Deep Learning , 2015, ICLR.
[200] Yoshua Bengio,et al. Training deep neural networks with low precision multiplications , 2014 .
[201] Markus Freitag,et al. Ensemble Distillation for Neural Machine Translation , 2017, ArXiv.
[202] Liujuan Cao,et al. Towards Optimal Structured CNN Pruning via Generative Adversarial Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[203] Robert M. Gray,et al. Speech coding based upon vector quantization , 1980, ICASSP.
[204] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[205] Alexander Novikov,et al. Tensorizing Neural Networks , 2015, NIPS.
[206] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[207] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.
[208] Edouard Grave,et al. Training with Quantization Noise for Extreme Model Compression , 2020, ICLR.
[209] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[210] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[211] Rich Caruana,et al. Model compression , 2006, KDD '06.
[212] John Langford,et al. Hash Kernels for Structured Data , 2009, J. Mach. Learn. Res..
[213] Gregory J. Wolff,et al. Optimal Brain Surgeon: Extensions and performance comparisons , 1993, NIPS 1993.
[214] Atsushi Fujita,et al. Recurrent Stacking of Layers for Compact Neural Machine Translation Models , 2018, AAAI.
[215] Asit K. Mishra,et al. Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.
[216] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[217] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.
[218] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[219] Dacheng Tao,et al. On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[220] Sachin S. Talathi,et al. Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.
[221] Jun-Fei Qiao,et al. A structure optimisation algorithm for feedforward neural network construction , 2013, Neurocomputing.
[222] Fei Han,et al. A Neural Network Pruning Method Optimized with PSO Algorithm , 2010, 2010 Second International Conference on Computer Modeling and Simulation.
[223] Lucas Theis,et al. Faster gaze prediction with dense networks and Fisher pruning , 2018, ArXiv.
[224] Eriko Nurvitadhi,et al. WRPN: Wide Reduced-Precision Networks , 2017, ICLR.
[225] Olatunji Ruwase,et al. ZeRO: Memory Optimization Towards Training A Trillion Parameter Models , 2019, SC.
[226] Parul Parashar,et al. Neural Networks in Machine Learning , 2014 .
[227] Lihi Zelnik-Manor,et al. ASAP: Architecture Search, Anneal and Prune , 2019, AISTATS.
[228] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[229] Erick Cantú-Paz. Pruning Neural Networks with Distribution Estimation Algorithms , 2003, GECCO.
[230] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[231] Larry S. Davis,et al. NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[232] Richard F. Lyon,et al. Neural Networks for Machine Learning , 2017 .
[233] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.