REDUCT: Keep it Close, Keep it Cool! : Efficient Scaling of DNN Inference on Multi-core CPUs with Near-Cache Compute
暂无分享,去创建一个
Shankar Balachandran | Sreenivas Subramoney | Belliappa Kuttanna | Joydeep Rakshit | Anant V. Nori | Avishaii Abuhatzera | Rahul Bera | Om J. Omer
[1] Minjia Zhang,et al. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster , 2018, USENIX Annual Technical Conference.
[2] Martin D. Schatz,et al. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications , 2018, ArXiv.
[3] Wolfgang Lehner,et al. NeMeSys - A Showcase of Data Oriented Near Memory Graph Processing , 2019, SIGMOD Conference.
[4] Rachata Ausavarungnirun,et al. Processing Data Where It Makes Sense: Enabling In-Memory Computation , 2019, Microprocess. Microsystems.
[5] Rachata Ausavarungnirun,et al. Enabling Practical Processing in and near Memory for Data-Intensive Computing , 2019, DAC.
[6] Nathan Beckmann,et al. Livia: Data-Centric Computing Throughout the Memory Hierarchy , 2020, ASPLOS.
[7] Jun Yang,et al. DrAcc: a DRAM based Accelerator for Accurate CNN Inference , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[8] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Barukh Ziv,et al. Lower Numerical Precision Deep Learning Inference and Training , 2018 .
[10] Kiran Kumar Matam,et al. GraphSSD: Graph Semantics Aware SSD , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[11] Scott A. Mahlke,et al. Duality Cache for Data Parallel Acceleration , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[12] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Mikko H. Lipasti,et al. Revolver: Processor architecture for power efficient loop execution , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[14] Xuanzhe Liu,et al. A First Look at Deep Learning Apps on Smartphones , 2018, WWW.
[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Jung Ho Ahn,et al. The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing , 2013, TACO.
[17] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[18] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[19] Rachata Ausavarungnirun,et al. Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks , 2018, ASPLOS.
[20] David Blaauw,et al. Compute Caches , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[21] Matti Siekkinen,et al. Latency and throughput characterization of convolutional neural networks for mobile computer vision , 2018, MMSys.
[22] S. Sagar Imambi,et al. PyTorch , 2021, Programming with TensorFlow.
[23] Carole-Jean Wu,et al. Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[24] Christoforos E. Kozyrakis,et al. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[25] Glenn Henry,et al. High-Performance Deep-Learning Coprocessor Integrated into x86 SoC with Server-Class CPUs Industrial Product , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[26] Alireza Shafaei,et al. FinCACTI: Architectural Analysis and Modeling of Caches with Deeply-Scaled FinFET Devices , 2014, 2014 IEEE Computer Society Annual Symposium on VLSI.
[27] Stijn Eyerman,et al. An Evaluation of High-Level Mechanistic Core Models , 2014, ACM Trans. Archit. Code Optim..
[28] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[29] Jeremy Kepner,et al. Survey and Benchmarking of Machine Learning Accelerators , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).
[30] J. Thomas Pawlowski,et al. Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[31] D. Ernst. Competing in Artificial Intelligence Chips: China’s Challenge amid Technology War , 2020 .
[32] Cody Coleman,et al. MLPerf Inference Benchmark , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[33] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[34] Rachata Ausavarungnirun,et al. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[35] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[36] Yu Wang,et al. GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[37] Onur Mutlu,et al. Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[38] Anshumali Shrivastava,et al. SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems , 2019, MLSys.
[39] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Alexander Heinecke,et al. Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[41] David Blaauw,et al. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[42] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[43] Wei Chen,et al. SkyLake-SP: A 14nm 28-Core xeon® processor , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).
[44] Wei Wang,et al. MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving , 2019, USENIX Annual Technical Conference.
[45] Bevan M. Baas,et al. Corrigendum to "Scaling equations for the accurate prediction of CMOS device performance from 180 nm to 7 nm" [Integr. VLSI J. 58. (2017) 74-81] , 2019, Integr..
[46] Onur Mutlu,et al. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).