Falcon: Addressing Stragglers in Heterogeneous Parameter Server Via Multiple Parallelism
暂无分享,去创建一个
Minyi Guo | Yanfei Sun | Song Guo | Kun Wang | Qihua Zhou | Haodong Lu | Li Li
[1] Minyi Guo,et al. Swallow: Joint Online Scheduling and Coflow Compression in Datacenter Networks , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[2] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[3] Gregory R. Ganger,et al. alsched: algebraic scheduling of mixed workloads in heterogeneous clouds , 2012, SoCC '12.
[4] Christopher Ré,et al. DimmWitted: A Study of Main-Memory Statistical Analytics , 2014, Proc. VLDB Endow..
[5] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[6] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[7] Jie Jiang,et al. Angel: a new large-scale machine learning system , 2018 .
[8] Raul Castro Fernandez,et al. Ako: Decentralised Deep Learning with Partial Gradient Exchange , 2016, SoCC.
[9] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[10] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.
[11] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[12] Bo Li,et al. Fast Distributed Deep Learning via Worker-adaptive Batch Sizing , 2018, SoCC.
[13] Scott Shenker,et al. Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.
[14] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Nassir Navab,et al. Analyzing and Exploiting NARX Recurrent Neural Networks for Long-Term Dependencies , 2017, ICLR.
[16] Eric P. Xing,et al. Addressing the straggler problem for iterative convergent parallel ML , 2016, SoCC.
[17] Minyi Guo,et al. Falcon: Towards Computation-Parallel Deep Learning in Heterogeneous Parameter Server , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).
[18] Yufei Tao,et al. DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation , 2015, SIGMOD Conference.
[19] Jiawei Jiang,et al. Heterogeneity-aware Distributed Parameter Servers , 2017, SIGMOD Conference.
[20] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[21] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.
[22] Hans-Peter Kriegel,et al. DBSCAN Revisited, Revisited , 2017, ACM Trans. Database Syst..
[23] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[24] Jianping Wu,et al. BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training , 2018, NeurIPS.
[25] Nassir Navab,et al. Revisiting NARX Recurrent Neural Networks for Long-Term Dependencies , 2017, ArXiv.
[26] F. Petrini,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[27] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[28] Yaoliang Yu,et al. Petuum: A New Platform for Distributed Machine Learning on Big Data , 2013, IEEE Transactions on Big Data.
[29] Fan Yang,et al. FlexPS: Flexible Parallelism Control in Parameter Server Architecture , 2018, Proc. VLDB Endow..
[30] Onur Mutlu,et al. Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.
[31] Adam Wierman,et al. Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale , 2015, SIGCOMM.
[32] Marshall Copeland,et al. Microsoft Azure , 2015, Apress.
[33] Randy H. Katz,et al. A Berkeley View of Systems Challenges for AI , 2017, ArXiv.
[34] S. Sagar Imambi,et al. PyTorch , 2021, Programming with TensorFlow.
[35] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[36] Scott Shenker,et al. Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .
[37] Seunghak Lee,et al. Exploiting Bounded Staleness to Speed Up Big Data Analytics , 2014, USENIX Annual Technical Conference.
[38] Albert G. Greenberg,et al. Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.
[39] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.
[40] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.
[41] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[42] Song Guo,et al. Cluster Frameworks for Efficient Scheduling and Resource Allocation in Data Center Networks: A Survey , 2018, IEEE Communications Surveys & Tutorials.
[43] Randy H. Katz,et al. Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.
[44] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[45] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[46] Minyi Guo,et al. Fast Coflow Scheduling via Traffic Compression and Stage Pipelining in Datacenter Networks , 2019, IEEE Transactions on Computers.