Litz: Elastic Framework for High-Performance Distributed Machine Learning
暂无分享,去创建一个
Abutalib Aghayev | Eric P. Xing | Garth A. Gibson | Weiren Yu | Qirong Ho | Haoyang Chen | Aurick Qiao | E. Xing | Qirong Ho | Aurick Qiao | Weiren Yu | Abutalib Aghayev | Haoyang Chen
[1] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[2] Mark Steyvers,et al. Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.
[3] Lawrence Carin,et al. Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[4] Gil Neiger,et al. Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.
[5] Jieping Ye,et al. Large-scale sparse logistic regression , 2009, KDD.
[6] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[7] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[8] Mahadev Konar,et al. ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX Annual Technical Conference.
[9] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[10] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.
[11] Peter J. Haas,et al. Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.
[12] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[13] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[14] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .
[15] Ambuj Tewari,et al. Feature Clustering for Accelerating Parallel Coordinate Descent , 2012, NIPS.
[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[17] Ion Stoica,et al. Probabilistically Bounded Staleness for Practical Partial Quorums , 2012, Proc. VLDB Endow..
[18] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[19] Michael Abd-El-Malek,et al. Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.
[20] Aaron Q. Li,et al. Parameter Server for Distributed Machine Learning , 2013 .
[21] Jian Li,et al. Migration-Based Elastic Consolidation Scheduling in Cloud Data Center , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems Workshops.
[22] Seunghak Lee,et al. Solving the Straggler Problem with Bounded Staleness , 2013, HotOS.
[23] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[24] Jiaxing Zhang,et al. Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning , 2014 .
[25] Matthew J. Streeter,et al. Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning , 2014, NIPS.
[26] Babak Shahbaba,et al. Distributed Stochastic Gradient MCMC , 2014, ICML.
[27] James T. Kwok,et al. Asynchronous Distributed ADMM for Consensus Optimization , 2014, ICML.
[28] Inderjit S. Dhillon,et al. NOMAD: Nonlocking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion , 2013, Proc. VLDB Endow..
[29] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[30] E. ∑tmNw. Appendix Fugue : Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data , 2014 .
[31] Yaoliang Yu,et al. Petuum: A New Platform for Distributed Machine Learning on Big Data , 2013, IEEE Transactions on Big Data.
[32] Tie-Yan Liu,et al. LightLDA: Big Topic Models on Modest Computer Clusters , 2014, WWW.
[33] Prateek Sharma,et al. SpotOn: a batch computing service for the spot market , 2015, SoCC.
[34] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[35] Yaoliang Yu,et al. Distributed Machine Learning via Sufficient Factor Broadcasting , 2015, ArXiv.
[36] Eric P. Xing,et al. Managed communication and consistency for fast data-parallel iterative analytics , 2015, SoCC.
[37] Eric P. Xing,et al. High-Performance Distributed ML at Scale through Parameter Server Consistency Models , 2014, AAAI.
[38] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[39] Xin He,et al. Flint: batch-interactive data-intensive processing on transient servers , 2016, EuroSys.
[40] Seunghak Lee,et al. STRADS: a distributed framework for scheduled model parallel machine learning , 2016, EuroSys.
[41] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[42] Yang Chen,et al. TR-Spark: Transient Computing for Big Data Analytics , 2016, SoCC.
[43] Carlo Curino,et al. Morpheus: Towards Automated SLOs for Enterprise Clusters , 2016, OSDI.
[44] Luke M. Leslie,et al. Supporting On-demand Elasticity in Distributed Graph Processing , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).
[45] Srikanth Kandula,et al. This Paper Is Included in the Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (osdi '16). Graphene: Packing and Dependency-aware Scheduling for Data-parallel Clusters G: Packing and Dependency-aware Scheduling for Data-parallel Clusters , 2022 .
[46] Jin Kyu Kim,et al. Benchmarking Apache Spark with Machine Learning Applications , 2016 .
[47] Aditya Akella,et al. Altruistic Scheduling in Multi-Resource Clusters , 2016, OSDI.
[48] Eric P. Xing,et al. Addressing the straggler problem for iterative convergent parallel ML , 2016, SoCC.
[49] Kevin Duh,et al. DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.
[50] Gregory R. Ganger,et al. Proteus: agile ML elasticity through tiered reliability in dynamic resource markets , 2017, EuroSys.
[51] Mingyi Hong,et al. A Distributed, Asynchronous, and Incremental Algorithm for Nonconvex Optimization: An ADMM Approach , 2014, IEEE Transactions on Control of Network Systems.