暂无分享,去创建一个
Jiawei Jiang | Xiangru Lian | Binhang Yuan | Ji Liu | Shaoduo Gan | Tengxu Sun | Chengjun Liu | Rui Wang | Ce Zhang | Sen Yang | Jianbin Chang | Hongmei Shi | Shengzhuo Zhang | Xianghong Li | Ji Liu | Xiangru Lian | Rui Wang | Binhang Yuan | Sen Yang | Ce Zhang | Shengzhuo Zhang | Hong-fan Shi | Jiawei Jiang | Shaoduo Gan | Xianghong Li | Jianbin Chang | Chengjun Liu | Tengxu Sun
[1] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.
[2] Yibo Zhu,et al. A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters , 2020, OSDI.
[3] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[4] Hanlin Tang,et al. Communication Compression for Decentralized Training , 2018, NeurIPS.
[5] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[6] Beng Chin Ooi,et al. Rafiki: Machine Learning as an Analytics Service System , 2018, Proc. VLDB Endow..
[7] Zhipeng Zhang,et al. PS2: Parameter Server on Spark , 2019, SIGMOD Conference.
[8] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[9] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.
[10] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[11] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[12] Zhipeng Zhang,et al. MLlib*: Fast Training of GLMs Using Spark MLlib , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).
[13] Jie Jiang,et al. Angel: a new large-scale machine learning system , 2018 .
[14] Yi Li,et al. Mariana: Tencent Deep Learning Platform and its Applications , 2014, Proc. VLDB Endow..
[15] Minjie Wang,et al. Supporting Very Large Models using Automatic Dataflow Graph Partitioning , 2018, EuroSys.
[16] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[17] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[18] Ji Liu,et al. DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.
[19] Martin Jaggi,et al. Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.
[20] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[21] Nenghai Yu,et al. Asynchronous Stochastic Gradient Descent with Delay Compensation , 2016, ICML.
[22] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[23] Ali Taylan Cemgil,et al. Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization , 2018, ICML.
[24] Chris Jermaine,et al. Tensor Relational Algebra for Distributed Machine Learning System Design , 2021, Proc. VLDB Endow..
[25] Vladimir Braverman,et al. Communication-efficient distributed SGD with Sketching , 2019, NeurIPS.
[26] Anthony K. H. Tung,et al. SINGA: A Distributed Deep Learning Platform , 2015, ACM Multimedia.
[27] Li Fei-Fei,et al. Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go? , 2018, ICML.
[28] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[29] Farzin Haddadpour,et al. Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization , 2019, NeurIPS.
[30] Shirish Tatikonda,et al. Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML , 2014, Proc. VLDB Endow..
[31] Hui Bu,et al. AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale , 2018, ArXiv.
[32] Ruben Mayer,et al. Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools , 2019 .
[33] Michael I. Jordan,et al. SparkNet: Training Deep Networks in Spark , 2015, ICLR.
[34] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[35] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[36] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[37] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[38] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[39] Xiangru Lian,et al. 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed , 2021, ICML.
[40] Shen Li,et al. PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers , 2021, ArXiv.
[41] Fan Yang,et al. FlexPS: Flexible Parallelism Control in Parameter Server Architecture , 2018, Proc. VLDB Endow..
[42] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[43] Shirish Tatikonda,et al. SystemML: Declarative Machine Learning on Spark , 2016, Proc. VLDB Endow..
[44] Gang Chen,et al. SINGA: Putting Deep Learning in the Hands of Multimedia Users , 2015, ACM Multimedia.
[45] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[46] Hao Zhang,et al. TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models , 2021, ICML.
[47] Xiangru Lian,et al. D2: Decentralized Training over Decentralized Data , 2018, ICML.
[48] Mani B. Srivastava,et al. In-database Distributed Machine Learning: Demonstration using Teradata SQL Engine , 2019, Proc. VLDB Endow..
[49] Nam Sung Kim,et al. Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training , 2018, NeurIPS.
[50] Kunle Olukotun,et al. Understanding and optimizing asynchronous low-precision stochastic gradient descent , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[51] Shenghuo Zhu,et al. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.
[52] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[53] Alexander Aiken,et al. Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.
[54] Jiawei Jiang,et al. Heterogeneity-aware Distributed Parameter Servers , 2017, SIGMOD Conference.
[55] Chuck Bear,et al. Vertica-ML: Distributed Machine Learning in Vertica Database , 2020, SIGMOD Conference.
[56] Rolf Rabenseifner,et al. Optimization of Collective Reduction Operations , 2004, International Conference on Computational Science.
[57] Peter Richtárik,et al. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.
[58] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[59] Asim Kadav,et al. MALT: distributed data-parallelism for existing ML applications , 2015, EuroSys.
[60] Tong Yang,et al. SketchML: Accelerating Distributed Machine Learning with Data Sketches , 2018, SIGMOD Conference.
[61] Chris Jermaine,et al. Declarative Recursive Computation on an RDBMS, or, Why You Should Use a Database For Distributed Machine Learning , 2019, ArXiv.
[62] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[63] Peter Richtárik,et al. On Biased Compression for Distributed Learning , 2020, ArXiv.
[64] Amar Phanishayee,et al. Memory-Efficient Pipeline-Parallel DNN Training , 2021, ICML.
[65] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[66] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[67] Sanjay Chawla,et al. A Cost-based Optimizer for Gradient Descent Optimization , 2017, SIGMOD Conference.
[68] Suhas Diggavi,et al. Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations , 2019, IEEE Journal on Selected Areas in Information Theory.
[69] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[70] Eric P. Xing,et al. High-Performance Distributed ML at Scale through Parameter Server Consistency Models , 2014, AAAI.
[71] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[72] Mladen Kolar,et al. Efficient Distributed Learning with Sparsity , 2016, ICML.
[73] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[74] Yuan Qi,et al. Asynchronous Distributed Variational Gaussian Process for Regression , 2017, ICML.
[75] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[76] Ce Zhang,et al. Distributed Learning Systems with First-Order Methods , 2020, Found. Trends Databases.
[77] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[78] Jiawei Jiang,et al. DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions , 2018, SIGMOD Conference.
[79] Eric P. Xing,et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.
[80] Jianyu Wang,et al. Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD , 2018, MLSys.
[81] Dimitris S. Papailiopoulos,et al. ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.
[82] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[83] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[84] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[85] Carsten Binnig,et al. DB4ML - An In-Memory Database Kernel with Machine Learning Support , 2020, SIGMOD Conference.