CEML: a Coordinated Runtime System for Efficient Machine Learning on Heterogeneous Computing Systems
暂无分享,去创建一个
Kyu Yeun Kim | Woongki Baek | Jinsu Park | Jihoon Hyun | Seongdae Yu | Woongki Baek | Jinsu Park | Jihoon Hyun | Kyu Yeun Kim | Seongdae Yu
[1] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[2] Vanchinathan Venkataramani,et al. Hierarchical power management for asymmetric multi-core in dark silicon era , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[3] Woongki Baek,et al. RCHC: A Holistic Runtime System for Concurrent Heterogeneous Computing , 2016, 2016 45th International Conference on Parallel Processing (ICPP).
[4] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[5] Anuj Pathania,et al. Power-performance modelling of mobile gaming workloads on heterogeneous MPSoCs , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[6] Xiaowei Li,et al. C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[7] Woongki Baek,et al. HAP: A Heterogeneity-Conscious Runtime System for Adaptive Pipeline Parallelism , 2016, Euro-Par.
[8] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.
[9] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[10] Henry Hoffmann,et al. Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments , 2010, ICAC '10.
[11] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[12] Quan Chen,et al. DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[13] Scott A. Mahlke,et al. Equalizer: Dynamic Tuning of GPU Resources for Efficient Execution , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[14] Woongki Baek,et al. HARS: A heterogeneity-aware runtime system for self-adaptive multithreaded applications , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).