Is Systematic Data Sharding able to Stabilize Asynchronous Parameter Server Training?
暂无分享,去创建一个
[1] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .
[2] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[3] Yan Xu,et al. Autotune: A Derivative-free Optimization Framework for Hyperparameter Tuning , 2018, KDD.
[4] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[5] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[6] Aditya Akella,et al. Network-accelerated distributed machine learning for multi-tenant settings , 2020, SoCC.
[7] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[8] Alexander J. Smola,et al. An architecture for parallel topic models , 2010, Proc. VLDB Endow..
[9] Ashutosh Vyas,et al. Deep Learning for Natural Language Processing , 2016 .
[10] SELIS BDA: Big Data Analytics for the Logistics Domain , 2020, 2020 IEEE International Conference on Big Data (Big Data).
[11] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[12] Adrián Castelló,et al. Analysis of model parallelism for distributed neural networks , 2019, EuroMPI.
[13] Yang Wang,et al. BigDL: A Distributed Deep Learning Framework for Big Data , 2018, SoCC.
[14] Xiaoyi Gao,et al. Human population structure detection via multilocus genotype clustering , 2007, BMC Genetics.
[15] Qiang Wang,et al. Benchmarking State-of-the-Art Deep Learning Software Tools , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).
[16] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[17] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[18] Rich Caruana,et al. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.
[19] Fan Yang,et al. FlexPS: Flexible Parallelism Control in Parameter Server Architecture , 2018, Proc. VLDB Endow..
[20] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .
[21] Nikodimos Provatas. Exploiting Data Distribution in Distributed Learning of Deep Classification Models under the Parameter Server Architecture , 2021, PhD@VLDB.
[22] David Gilmore,et al. Modeling Order in Neural Word Embeddings at Scale , 2015, ICML.
[23] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[24] Grigorios Tsoumakas,et al. On the Stratification of Multi-label Data , 2011, ECML/PKDD.
[25] Gustavo Carneiro,et al. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging , 2019, CHIL.
[26] Ayman El-Baz,et al. Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers , 2018, Journal of Medical Systems.
[27] Joel Nishimura,et al. Restreaming graph partitioning: simple versatile algorithms for advanced balancing , 2013, KDD.
[28] Nectarios Koziris,et al. General-Purpose vs. Specialized Data Analytics Systems: A Game of ML & SQL Thrones , 2019, 2019 IEEE International Conference on Big Data (Big Data).
[29] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[30] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[31] Neoklis Polyzotis,et al. Data Lifecycle Challenges in Production Machine Learning , 2018, SIGMOD Rec..
[32] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.
[33] Ji Liu,et al. Staleness-Aware Async-SGD for Distributed Deep Learning , 2015, IJCAI.
[34] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[35] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[36] Rachid Guerraoui,et al. Asynchronous Byzantine Machine Learning ( the case of SGD ) Supplementary Material , 2022 .
[37] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Aaron Q. Li,et al. Parameter Server for Distributed Machine Learning , 2013 .
[39] Ameet Talwalkar,et al. MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..
[40] Harm de Vries,et al. RMSProp and equilibrated adaptive learning rates for non-convex optimization. , 2015 .
[41] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[42] Beng Chin Ooi,et al. DSH: data sensitive hashing for high-dimensional k-nnsearch , 2014, SIGMOD Conference.
[43] Phillip B. Gibbons,et al. The Non-IID Data Quagmire of Decentralized Machine Learning , 2019, ICML.
[44] Suyog Gupta,et al. Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study , 2015, 2016 IEEE 16th International Conference on Data Mining (ICDM).
[45] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[46] Parijat Dube,et al. Slow and Stale Gradients Can Win the Race , 2018, IEEE Journal on Selected Areas in Information Theory.
[47] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[48] Nectarios Koziris,et al. BigOptiBase: Big Data Analytics for Base Station Energy Consumption Optimization , 2019, 2019 IEEE International Conference on Big Data (Big Data).
[49] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[50] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.