Improving the Performance of Distributed MXNet with RDMA
暂无分享,去创建一个
Xu Jin | Han Lin | Zheng Wu | Mingfan Li | Ke Wen | Hong An | Mengxian Chi | Han Lin | Mengxian Chi | Xu Jin | Ke Wen | Hong An | Zheng Wu | Mingfan Li
[1] Sayantan Sur,et al. Unifying UPC and MPI runtimes: experience with MVAPICH , 2010, PGAS '10.
[2] Sayantan Sur,et al. A Brief Introduction to the OpenFabrics Interfaces - A New Network API for Maximizing High Performance Application Efficiency , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.
[3] Amith R. Mamidala,et al. MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning , 2018, ArXiv.
[4] David G. Andersen,et al. Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.
[5] Dhabaleswar K. Panda,et al. High performance RDMA-based design of HDFS over InfiniBand , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] James Demmel,et al. ImageNet Training in Minutes , 2017, ICPP.
[7] Dhabaleswar K. Panda,et al. High-Performance Design of Hadoop RPC with RDMA over InfiniBand , 2013, 2013 42nd International Conference on Parallel Processing.
[8] Erik Cambria,et al. Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..
[9] Qian Liu,et al. An Integrated Tutorial on InfiniBand, Verbs, and MPI , 2017, IEEE Communications Surveys & Tutorials.
[10] Hai Jin,et al. An Introduction to the InfiniBand Architecture , 2002 .
[11] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[12] Sayantan Sur,et al. Memcached Design on High Performance RDMA Capable Interconnects , 2011, 2011 International Conference on Parallel Processing.
[13] Jinyang Li,et al. Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.
[14] Aaron Q. Li,et al. Parameter Server for Distributed Machine Learning , 2013 .
[15] Dhabaleswar K. Panda,et al. High performance RDMA-based MPI implementation over InfiniBand , 2003, ICS.
[16] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.
[17] Gustavo Pérez,et al. Automated detection of lung nodules with three-dimensional convolutional neural networks , 2017, Symposium on Medical Information Processing and Analysis.
[18] Wenting Han,et al. Improving the Performance of Distributed TensorFlow with RDMA , 2017, International Journal of Parallel Programming.
[19] Marleen de Bruijne,et al. Machine learning approaches in medical image analysis: From detection to diagnosis , 2016, Medical Image Anal..