Efficient and Programmable Distributed Shared Memory Systems for Machine Learning Training
暂无分享,去创建一个
[1] Joseph K. Bradley,et al. Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale , 2016, NIPS.
[2] Peter Norvig,et al. Deep Learning with Dynamic Computation Graphs , 2017, ICLR.
[3] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[4] Alan Edelman,et al. Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..
[5] Garth A. Gibson,et al. PRObE: A Thousand-Node Experimental Cluster for Computer Systems Research , 2013, login Usenix Mag..
[6] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[7] Eric P. Xing,et al. Exploiting iterative-ness for parallel ML computations , 2014, SoCC.
[8] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[9] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.
[10] Weimin Zheng,et al. Exploring the Hidden Dimension in Graph Processing , 2016, OSDI.
[11] Seunghak Lee,et al. STRADS: a distributed framework for scheduled model parallel machine learning , 2016, EuroSys.
[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[14] Michael I. Jordan,et al. SparkNet: Training Deep Networks in Spark , 2015, ICLR.
[15] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[16] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[17] Shou-De Lin,et al. Feature Engineering and Classifier Ensemble for KDD Cup 2010 , 2010, KDD 2010.
[18] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[19] Wei Li,et al. Tux2: Distributed Graph Computation for Machine Learning , 2017, NSDI.
[20] Joseph M. Hellerstein,et al. GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.
[21] John K. Ousterhout,et al. Scripting: Higher-Level Programming for the 21st Century , 1998, Computer.
[22] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[23] Matthew J. Streeter,et al. Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning , 2014, NIPS.
[24] Tie-Yan Liu,et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.
[25] Kirk L. Johnson,et al. CRL: high-performance all-software distributed shared memory , 1995, SOSP.
[26] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[27] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[28] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[29] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[30] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.
[31] Scott Shenker,et al. Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .
[32] Eric P. Xing,et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.
[33] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .
[34] Steven Hand,et al. CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.
[35] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.
[36] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[37] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[38] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[39] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[40] Alexander J. Smola,et al. Scalable inference in latent variable models , 2012, WSDM '12.
[41] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[42] Albert G. Greenberg,et al. Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.
[43] M. Abadi,et al. Naiad: a timely dataflow system , 2013, SOSP.
[44] Gu-Yeon Wei,et al. HELIX: automatic parallelization of irregular programs for chip multiprocessing , 2012, CGO '12.
[45] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[46] Martín Abadi,et al. Dynamic control flow in large-scale machine learning , 2018, EuroSys.
[47] Eric P. Xing,et al. Managed communication and consistency for fast data-parallel iterative analytics , 2015, SoCC.
[48] Mohammed J. Zaki,et al. Arabesque: a system for distributed graph mining , 2015, SOSP.
[49] Michael Isard,et al. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.
[50] Seunghak Lee,et al. Exploiting Bounded Staleness to Speed Up Big Data Analytics , 2014, USENIX Annual Technical Conference.
[51] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[52] Binyu Zang,et al. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.
[53] David A. Patterson,et al. A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution , 2018, IEEE Micro.
[54] Medhat A. Moussa,et al. Resource Efficient Arithmetic Effects on RBM Neural Network Solution Quality Using MNIST , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.
[55] Eric P. Xing,et al. High-Performance Distributed ML at Scale through Parameter Server Consistency Models , 2014, AAAI.
[56] Monica S. Lam,et al. Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.
[57] Paul Hudak,et al. Memory coherence in shared virtual memory systems , 1986, PODC '86.
[58] Wenguang Chen,et al. Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.
[59] Michael I. Jordan,et al. Estimation, Optimization, and Parallelism when Data is Sparse , 2013, NIPS.
[60] Itamar Arel,et al. Low-Rank Approximations for Conditional Feedforward Computation in Deep Neural Networks , 2013, ICLR.
[61] Reynold Xin,et al. GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.
[62] Yoshua Bengio,et al. Low precision arithmetic for deep learning , 2014, ICLR.
[63] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[64] Monica S. Lam,et al. Efficient and exact data dependence analysis , 1991, PLDI '91.
[65] Alan Edelman,et al. On Machine Learning and Programming Languages , 2018 .
[66] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.
[67] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.
[68] Jinyang Li,et al. Building fast, distributed programs with partitioned tables , 2010 .
[69] Veljko M. Milutinovic,et al. Distributed shared memory: concepts and systems , 1997, IEEE Parallel Distributed Technol. Syst. Appl..
[70] Seunghak Lee,et al. Solving the Straggler Problem with Bounded Staleness , 2013, HotOS.
[71] Joelle Pineau,et al. Conditional Computation in Neural Networks for faster models , 2015, ArXiv.