暂无分享,去创建一个
Shankar Balachandran | Sreenivas Subramoney | Om Ji Omer | Belliappa Kuttanna | Joydeep Rakshit | Anant V. Nori | Rahul Bera | Avishaii Abuhatzera | S. Subramoney | S. Balachandran | Rahul Bera | O. J. Omer | Joydeep Rakshit | Avishaii Abuhatzera | B. Kuttanna
[1] David Blaauw,et al. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[2] Scott A. Mahlke,et al. Duality Cache for Data Parallel Acceleration , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[3] Minjia Zhang,et al. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster , 2018, USENIX Annual Technical Conference.
[4] Kiran Kumar Matam,et al. GraphSSD: Graph Semantics Aware SSD , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[5] Stijn Eyerman,et al. An Evaluation of High-Level Mechanistic Core Models , 2014, ACM Trans. Archit. Code Optim..
[6] Carole-Jean Wu,et al. MLPerf Inference Benchmark , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[7] Bevan M. Baas,et al. Corrigendum to "Scaling equations for the accurate prediction of CMOS device performance from 180 nm to 7 nm" [Integr. VLSI J. 58. (2017) 74-81] , 2019, Integr..
[8] Jung Ho Ahn,et al. The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing , 2013, TACO.
[9] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[13] Rachata Ausavarungnirun,et al. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[15] Christoforos E. Kozyrakis,et al. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[16] Xuanzhe Liu,et al. A First Look at Deep Learning Apps on Smartphones , 2018, WWW.
[17] Onur Mutlu,et al. Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] Carole-Jean Wu,et al. Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[19] J. Thomas Pawlowski,et al. Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[20] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[21] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[22] Alireza Shafaei,et al. FinCACTI: Architectural Analysis and Modeling of Caches with Deeply-Scaled FinFET Devices , 2014, 2014 IEEE Computer Society Annual Symposium on VLSI.
[23] David Blaauw,et al. Compute Caches , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[24] Jun Yang,et al. DrAcc: a DRAM based Accelerator for Accurate CNN Inference , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[25] Onur Mutlu,et al. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).
[26] Yu Wang,et al. GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[27] Matti Siekkinen,et al. Latency and throughput characterization of convolutional neural networks for mobile computer vision , 2018, MMSys.
[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Wei Wang,et al. MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving , 2019, USENIX Annual Technical Conference.
[30] Wolfgang Lehner,et al. NeMeSys - A Showcase of Data Oriented Near Memory Graph Processing , 2019, SIGMOD Conference.
[31] Wei Chen,et al. SkyLake-SP: A 14nm 28-Core xeon® processor , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).
[32] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[33] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[34] Martin D. Schatz,et al. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications , 2018, ArXiv.
[35] Rachata Ausavarungnirun,et al. Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks , 2018, ASPLOS.
[36] Alexander Heinecke,et al. Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.