Towards Memory-Efficient Allocation of CNNs on Processing-in-Memory Architecture
暂无分享,去创建一个
Yi Wang | Jing Yang | Tao Li | Weixuan Chen | Yi Wang | Tao Li | Weixuan Chen | Jing Yang
[1] Luca Benini,et al. A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[2] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[3] Alireza Ejlali,et al. NPAM: NVM-Aware Page Allocation for Multi-Core Embedded Systems , 2017, IEEE Transactions on Computers.
[4] Eduard Ayguadé,et al. Task Scheduling Techniques for Asymmetric Multi-Core Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.
[5] Yiran Chen,et al. A new learning method for inference accuracy, core occupation, and performance co-optimization on TrueNorth chip , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[6] Laurence T. Yang,et al. Resource Sharing in Multicore Mixed-Criticality Systems: Utilization Bound and Blocking Overhead , 2017, IEEE Transactions on Parallel and Distributed Systems.
[7] Christoforos E. Kozyrakis,et al. Practical Near-Data Processing for In-Memory Analytics Frameworks , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[8] Yu Wang,et al. Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[9] Youyou Lu,et al. Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency , 2017, ArXiv.
[10] Gi-Ho Park,et al. NVM Way Allocation Scheme to Reduce NVM Writes for Hybrid Cache Architecture in Chip-Multiprocessors , 2017, IEEE Transactions on Parallel and Distributed Systems.
[11] Mingyu Gao,et al. HRL: Efficient and flexible reconfigurable logic for near-data processing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[12] Yann LeCun,et al. Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.
[13] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.
[14] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[15] Jing Huang,et al. Energy-Efficient Resource Utilization for Heterogeneous Embedded Computing Systems , 2017, IEEE Transactions on Computers.
[16] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[17] Luca Benini,et al. YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[18] Laurence T. Yang,et al. Multicore Mixed-Criticality Systems: Partitioned Scheduling and Utilization Bound , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[19] Yi Lin,et al. Durable and Energy Efficient In-Memory Frequent-Pattern Mining , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[20] Kiyoung Choi,et al. Efficient FPGA acceleration of Convolutional Neural Networks using logical-3D compute array , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[21] Franz Franchetti,et al. Data reorganization in memory using 3D-stacked DRAM , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[22] Soheil Ghiasi,et al. Implementation-Aware Model Analysis: The Case of Buffer-Throughput Tradeoff in Streaming Applications , 2015, LCTES.
[23] Nikil D. Dutt,et al. SPARTA: Runtime task allocation for energy efficient heterogeneous manycores , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[24] Yang Li,et al. Non-Volatile Memory Based Page Swapping for Building High-Performance Mobile Devices , 2017, IEEE Transactions on Computers.
[25] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[26] Shimeng Yu,et al. MNSIM: Simulation platform for memristor-based neuromorphic computing system , 2016, DATE 2016.
[27] Onur Mutlu,et al. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[28] Dakshina Dasari,et al. Time-Triggered Co-Scheduling of Computation and Communication with Jitter Requirements , 2017, IEEE Transactions on Computers.
[29] Soheil Ghiasi,et al. Look into details: the benefits of fine-grain streaming buffer analysis , 2010, LCTES '10.
[30] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[31] J. Thomas Pawlowski,et al. Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[32] Waqar Ali,et al. BWLOCK: A Dynamic Memory Access Control Framework for Soft Real-Time Applications on Multicore Platforms , 2017, IEEE Transactions on Computers.
[33] Yannis Papakonstantinou,et al. SSD In-Storage Computing for Search Engines , 2016 .
[34] Zili Shao,et al. Memory-Aware Task Scheduling with Communication Overhead Minimization for Streaming Applications on Bus-Based Multiprocessor System-on-Chips , 2014, IEEE Transactions on Parallel and Distributed Systems.
[35] Jing Yang,et al. Towards memory-efficient processing-in-memory architecture for convolutional neural networks , 2017, LCTES.
[36] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[37] Yi Wang,et al. vFlash: Virtualized Flash for Optimizing the I/O Performance in Mobile Devices , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[38] Jie Xu,et al. DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[39] Aaron Smith,et al. A machine learning approach to mapping streaming workloads to dynamic multicore processors , 2016, LCTES.