Infrastructure-Aware TensorFlow for Heterogeneous Datacenters
暂无分享,去创建一个
M. Mustafa Rafique | Seung-Hwan Lim | Moiz Arif | Zaki Malik | Zaki Malik | M. M. Rafique | Seung-Hwan Lim | Moiz Arif
[1] Michael John Sebastian Smith,et al. Application-specific integrated circuits , 1997 .
[2] Erik Nijkamp,et al. Deep Learning With TensorFlow: A Review , 2019, Journal of Educational and Behavioral Statistics.
[3] Xi Li,et al. Accelerating Distributed Training in Heterogeneous Clusters via a Straggler-Aware Parameter Server , 2019, 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[4] Mahmut T. Kandemir,et al. Phoenix: A Constraint-Aware Scheduler for Heterogeneous Datacenters , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).
[5] Timothy Wood,et al. Benefits and challenges of managing heterogeneous data centers , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).
[6] Christina Delimitrou,et al. Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.
[7] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[8] Quoc V. Le,et al. A Hierarchical Model for Device Placement , 2018, ICLR.
[9] Michael J. Freedman,et al. Resource Elasticity in Distributed Deep Learning , 2020, MLSys.
[10] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. , 2018 .
[11] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[12] S. Hemminger. Network Emulation with NetEm , 2022 .
[13] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Ali R. Butt,et al. MARBLE: A Multi-GPU Aware Job Scheduler for Deep Learning on HPC Systems , 2020, 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID).
[15] Joseph Manzano,et al. User-transparent Distributed TensorFlow , 2017, ArXiv.
[16] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[17] Marco Pavone,et al. A machine learning approach for real-time reachability analysis , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[18] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[19] Wolfgang Barth,et al. Nagios: System and Network Monitoring , 2006 .
[20] Marco Zanetti,et al. Machine Learning Pipelines with Modern Big Data Tools for High Energy Physics , 2019, ArXiv.
[21] David E. Culler,et al. The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..
[22] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[23] Simeng Liu,et al. tensorflow-tracing: A Performance Tuning Framework for Production , 2019, OpML.
[24] Wei-Hua Bai,et al. Performance Analysis of Heterogeneous Data Centers in Cloud Computing Using a Complex Queuing Model , 2015 .
[25] Jiawei Jiang,et al. Heterogeneity-aware Distributed Parameter Servers , 2017, SIGMOD Conference.
[26] Theocharis Theocharides,et al. Edge Intelligence: Challenges and Opportunities of Near-Sensor Machine Learning Applications , 2018, 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[27] Lingjia Tang,et al. Heterogeneity in “Homogeneous” Warehouse-Scale Computers: A Performance Opportunity , 2011, IEEE Computer Architecture Letters.
[28] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[29] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Eric P. Xing,et al. Addressing the straggler problem for iterative convergent parallel ML , 2016, SoCC.
[31] David R. Kaeli,et al. Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems , 2013, GPGPU@ASPLOS.