Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures
暂无分享,去创建一个
Yang Hu | Tao Li | Huixiang Chen | Mingcong Song | Tao Li | Mingcong Song | Huixiang Chen | Yang Hu
[1] Nam Sung Kim,et al. The case for GPGPU spatial multitasking , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[2] Xin Fu,et al. Characterizing, modeling, and improving the QoE of mobile devices with low battery level , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[3] Vijay Janapa Reddi,et al. Event-based scheduling for energy-efficient QoS (eQoS) in mobile Web applications , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[4] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[5] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[6] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[7] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Longjun Liu,et al. Towards sustainable in-situ server systems in the big data era , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[9] Depei Qian,et al. Scheduling Tasks with Mixed Timing Constraints in GPU-Powered Real-Time Systems , 2016, ICS.
[10] Zheng Wang,et al. Using latency to evaluate interactive system performance , 1996, OSDI '96.
[11] Ronald G. Dreslinski,et al. Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.
[12] John Kim,et al. Improving GPGPU resource utilization through alternative thread block scheduling , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[13] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[14] Nikolaos Papanikolopoulos,et al. Multi-class active learning for image classification , 2009, CVPR.
[15] Jingling Yuan,et al. Bridging the semantic gaps of GPU acceleration for scale-out CNN-based big data processing: Think big, see small , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).
[16] Yun Liang,et al. Efficient GPU Spatial-Temporal Multitasking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[17] Carole-Jean Wu,et al. Improving smartphone user experience by balancing performance and energy with probabilistic QoS guarantee , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[18] Scott A. Mahlke,et al. SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] Jian Sun,et al. Accelerating Very Deep Convolutional Networks for Classification and Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[20] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[21] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[22] Quan Chen,et al. DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[23] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[24] Xipeng Shen,et al. SatScore: uncovering and avoiding a principled pitfall in responsiveness measurements of app launches , 2014, UbiComp.
[25] Dongrui Fan,et al. Enabling coordinated register allocation and thread-level parallelism optimization for GPUs , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[26] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[27] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[28] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[29] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.