暂无分享,去创建一个
[1] Tongfeng Sun,et al. Review of classical dimensionality reduction and sample selection methods for large-scale data processing , 2019, Neurocomputing.
[2] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[3] Ananda Theertha Suresh,et al. Distributed Mean Estimation with Limited Communication , 2016, ICML.
[4] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[5] Allen Gersho,et al. Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.
[6] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[7] Martin J. Wainwright,et al. Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.
[8] Jemin George,et al. SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization , 2020, IEEE Journal on Selected Areas in Information Theory.
[9] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[10] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[11] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[12] William Feller,et al. An Introduction to Probability Theory and Its Applications, Vol. 2 , 1967 .
[13] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.
[14] Junzhou Huang,et al. Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization , 2018, ICML.
[15] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[16] Daniel S. Yeung,et al. Input sample selection for RBF neural network classification problems using sensitivity measure , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).
[17] Samyadeep Basu,et al. Influence Functions in Deep Learning Are Fragile , 2020, ICLR.
[18] Prathamesh Mayekar,et al. RATQ: A Universal Fixed-Length Quantizer for Stochastic Optimization , 2019, IEEE Transactions on Information Theory.
[19] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[20] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[21] R. Dennis Cook,et al. Detection of Influential Observation in Linear Regression , 2000, Technometrics.
[22] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.
[23] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.
[24] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[25] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[26] Geoffrey Zweig,et al. An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.
[27] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[28] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[29] Suhas Diggavi,et al. Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations , 2019, IEEE Journal on Selected Areas in Information Theory.
[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[32] Raj Kumar Maity,et al. vqSGD: Vector Quantized Stochastic Gradient Descent , 2019, IEEE Transactions on Information Theory.
[33] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[34] R. Gray,et al. Dithered Quantizers , 1993, Proceedings. 1991 IEEE International Symposium on Information Theory.
[35] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[36] F. Hampel. The Influence Curve and Its Role in Robust Estimation , 1974 .
[37] Kunle Olukotun,et al. Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms , 2015, NIPS.