SKCompress: compressing sparse and nonuniform gradient in distributed machine learning
暂无分享,去创建一个
Tong Yang | Bin Cui | Yingxia Shao | Fangcheng Fu | Jiawei Jiang | B. Cui | Yingxia Shao | Tong Yang | Tong Yang | Jiawei Jiang | Bin Cui | Fangcheng Fu
[1] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.
[2] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[3] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[4] S.C. Hinds,et al. A document skew detection method using run-length encoding and the Hough transform , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.
[5] David W. Hosmer,et al. Applied Logistic Regression , 1991 .
[6] Thomas Parnell,et al. Tera-scale coordinate descent on GPUs , 2020, Future Gener. Comput. Syst..
[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[8] Peter Deutsch,et al. DEFLATE Compressed Data Format Specification version 1.3 , 1996, RFC.
[9] Dan Alistarh,et al. ZipML: An End-to-end Bitwise Framework for Dense Generalized Linear Models , 2016, ArXiv.
[10] Sanjeev Khanna,et al. Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.
[11] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[12] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[13] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[14] Qi Zhang,et al. A Fast Algorithm for Approximate Quantiles in High Speed Data Streams , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).
[15] Quoc V. Le,et al. Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.
[16] Yiguang Hong,et al. Distributed regression estimation with incomplete data in multi-agent networks , 2018, Science China Information Sciences.
[17] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.
[18] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[19] George A. F. Seber,et al. Linear regression analysis , 1977 .
[20] KhannaSanjeev,et al. Space-efficient online computation of quantile summaries , 2001 .
[21] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.
[22] Junzhou Huang,et al. Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization , 2018, ICML.
[23] Satoshi Matsuoka,et al. Scaling Word2Vec on Big Corpus , 2019, Data Science and Engineering.
[24] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[25] Matthew J. Streeter,et al. Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning , 2014, NIPS.
[26] Bor-Yiing Su,et al. Robust Large-Scale Machine Learning in the Cloud , 2016, KDD.
[27] Gaogang Xie,et al. A Shifting Bloom Filter Framework for Set Queries , 2015, Proc. VLDB Endow..
[28] Marimuthu Palaniswami,et al. Internet of Things (IoT): A vision, architectural elements, and future directions , 2012, Future Gener. Comput. Syst..
[29] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.
[30] Peng Liu,et al. Elastic sketch: adaptive and fast network-wide measurements , 2018, SIGCOMM.
[31] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[32] Christopher Ré,et al. DimmWitted: A Study of Main-Memory Statistical Analytics , 2014, Proc. VLDB Endow..
[33] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[34] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[35] Yanxiang Huang,et al. TencentRec: Real-time Stream Recommendation in Practice , 2015, SIGMOD Conference.
[36] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[37] A. Stephen McGough,et al. Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study , 2018, Data Science and Engineering.
[38] Haris Pozidis,et al. Large-Scale Stochastic Learning Using GPUs , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[39] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[40] Rodney X. Sturdivant,et al. Applied Logistic Regression: Hosmer/Applied Logistic Regression , 2005 .
[41] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[42] Bin Cui,et al. LDA*: A Robust and Large-scale Topic Modeling System , 2017, Proc. VLDB Endow..
[43] Alexander J. Smola,et al. DiFacto: Distributed Factorization Machines , 2016, WSDM.
[44] K. D. Ikramov. Sparse matrices , 2020, Krylov Subspace Methods with Application in Incompressible Fluid Flow Solvers.
[45] Jiawei Jiang,et al. TeslaML: Steering Machine Learning Automatically in Tencent , 2017, APWeb/WAIM.
[46] Johan A. K. Suykens,et al. Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.
[47] Hua Lu,et al. GVoS , 2017, ACM Trans. Inf. Syst..
[48] Jiawei Jiang,et al. Heterogeneity-aware Distributed Parameter Servers , 2017, SIGMOD Conference.
[49] Xinyu Wang,et al. Real-time intelligent big data processing: technology, platform, and applications , 2019, Science China Information Sciences.
[50] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[51] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[52] Randy H. Katz,et al. A view of cloud computing , 2010, CACM.
[53] Dan Alistarh,et al. QSGD: Randomized Quantization for Communication-Optimal Stochastic Gradient Descent , 2016, ArXiv.
[54] Dimitris S. Papailiopoulos,et al. ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.
[55] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[56] Jiawei Jiang,et al. TencentBoost: A Gradient Boosting Tree System with Parameter Server , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).
[57] Michael Garland,et al. Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .
[58] Hanlin Tang,et al. Communication Compression for Decentralized Training , 2018, NeurIPS.
[59] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[60] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[61] Donald E. Knuth,et al. Dynamic Huffman Coding , 1985, J. Algorithms.
[62] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[63] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[64] Graham Cormode,et al. An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.
[65] Xing Xie,et al. Mining interesting locations and travel sequences from GPS trajectories , 2009, WWW '09.