Taming Extreme Heterogeneity via Machine Learning based Design of Autonomous Manycore Systems
暂无分享,去创建一个
Shahin Nazarian | Janardhan Rao Doppa | Paul Bogdan | Yao Xiao | Linghao Song | Fan Chen | Biresh Kumar Joardar | Hai (Helen) Li | Aryan Deshwal | H. Li | Linghao Song | Aryan Deshwal | Shahin Nazarian | P. Bogdan | Fan Chen | Yao Xiao | J. Doppa | B. K. Joardar
[1] Shahin Nazarian,et al. Self-Optimizing and Self-Programming Computing Systems: A Combined Compiler, Complex Networks, and Machine Learning Approach , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[2] Yuankun Xue,et al. Reconstructing missing complex networks against adversarial interventions , 2019, Nature Communications.
[3] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.
[4] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[5] Radu Marculescu,et al. Learning-Based Application-Agnostic 3D NoC Design for Heterogeneous Manycore Systems , 2018, IEEE Transactions on Computers.
[6] Yuankun Xue,et al. Scalable and realistic benchmark synthesis for efficient NoC performance evaluation: A complex network analysis approach , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[7] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[8] Shahin Nazarian,et al. Prometheus: Processing-in-memory heterogeneous architecture design from a multi-layer network theoretic strategy , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[9] Partha Pratim Pande,et al. Monolithic 3D-Enabled High Performance and Energy Efficient Network-on-Chip , 2017, 2017 IEEE International Conference on Computer Design (ICCD).
[10] R. Jordan,et al. NVM neuromorphic core with 64k-cell (256-by-256) phase change memory synaptic array with on-chip neuron circuits for continuous in-situ learning , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).
[11] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[12] Paul Bogdan,et al. Ollivier-Ricci Curvature-Based Method to Community Detection in Complex Networks , 2019, Scientific Reports.
[13] Hao Yu,et al. Energy efficient in-memory machine learning for data intensive image-processing by non-volatile domain-wall memory , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).
[14] Partha Pratim Pande,et al. Machine Learning for Design Space Exploration and Optimization of Manycore Systems , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[15] Hao Jiang,et al. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[16] Radu Marculescu,et al. Imitation Learning for Dynamic VFI Control in Large-Scale Manycore Systems , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[17] Carole-Jean Wu,et al. Quantifying the energy cost of data movement for emerging smart phone workloads on mobile platforms , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).
[18] Georg Hager,et al. Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.
[19] Partha Pratim Pande,et al. REGENT: A Heterogeneous ReRAM/GPU-based Architecture Enabled by NoC for Training CNNs , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[20] Axel Jantsch,et al. The Benefits of Self-Awareness and Attention in Fog and Mist Computing , 2015, Computer.
[21] Yuankun Xue,et al. User Cooperation Network Coding Approach for NoC Performance Improvement , 2015, NOCS.
[22] Partha Pratim Pande,et al. Impact of Electrostatic Coupling on Monolithic 3D-enabled Network on Chip , 2019, ACM Trans. Design Autom. Electr. Syst..
[23] Umit Y. Ogras,et al. Dynamic Resource Management of Heterogeneous Mobile Platforms via Imitation Learning , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[24] Ji Li,et al. Fundamental Challenges Toward Making the IoT a Reachable Reality , 2017, ACM Trans. Design Autom. Electr. Syst..
[25] Partha Pratim Pande,et al. Optimizing 3D NoC design for energy efficiency: A machine learning approach , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[26] Yiran Chen,et al. GraphR: Accelerating Graph Processing Using ReRAM , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[27] Zbigniew J. Czech,et al. Introduction to Parallel Computing , 2017 .
[28] Yiran Chen,et al. ZARA: A Novel Zero-free Dataflow Accelerator for Generative Adversarial Networks in 3D ReRAM , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[29] Ujjwal Maulik,et al. A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA , 2008, IEEE Transactions on Evolutionary Computation.
[30] Radu Marculescu,et al. Hybrid On-Chip Communication Architectures for Heterogeneous Manycore Systems , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[31] Catherine Graves,et al. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[32] Tianshi Chen,et al. DaDianNao: A Neural Network Supercomputer , 2017, IEEE Transactions on Computers.
[33] Kaushik Roy,et al. SPINDLE: SPINtronic Deep Learning Engine for large-scale neuromorphic computing , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).
[34] Axel Jantsch,et al. Toward Smart Embedded Systems , 2016, ACM Trans. Embed. Comput. Syst..
[35] Jung Ho Ahn,et al. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[36] Martine D. F. Schlag,et al. Spectral K-way ratio-cut partitioning and clustering , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[37] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[38] Partha Pratim Pande,et al. Design and Optimization of Heterogeneous Manycore Systems Enabled by Emerging Interconnect Technologies: Promises and Challenges , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[39] Yiran Chen,et al. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[40] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[41] Yuankun Xue,et al. Improving NoC performance under spatio-temporal variability by runtime reconfiguration: a general mathematical framework , 2016, 2016 Tenth IEEE/ACM International Symposium on Networks-on-Chip (NOCS).
[42] Partha Pratim Pande,et al. Performance and Thermal Tradeoffs for Energy-Efficient Monolithic 3D Network-on-Chip , 2018, ACM Trans. Design Autom. Electr. Syst..
[43] Qing Wu,et al. Hardware realization of BSB recall function using memristor crossbar arrays , 2012, DAC Design Automation Conference 2012.
[44] Martin Lukasiewycz,et al. SAT-decoding in evolutionary algorithms for discrete constrained optimization problems , 2007, 2007 IEEE Congress on Evolutionary Computation.
[45] Gu-Yeon Wei,et al. Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[46] Axel Jantsch,et al. Self-Awareness in Systems on Chip— A Survey , 2017, IEEE Design & Test.
[47] Shahin Nazarian,et al. A load balancing inspired optimization framework for exascale multicore systems: A complex networks approach , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[48] Radu Marculescu,et al. Machine Learning and Manycore Systems Design: A Serendipitous Symbiosis , 2018, Computer.
[49] Hai Li,et al. EMAT: An Efficient Multi-Task Architecture for Transfer Learning using ReRAM , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[50] Santo Fortunato,et al. Community detection in graphs , 2009, ArXiv.
[51] Partha Pratim Pande,et al. MOOS , 2019, ACM Trans. Embed. Comput. Syst..
[52] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[53] Yiran Chen,et al. ReGAN: A pipelined ReRAM-based accelerator for generative adversarial networks , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).
[54] Xuehai Qian,et al. HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[55] Donatella Sciuto,et al. Optimization Strategies in Design Space Exploration , 2017, Handbook of Hardware/Software Codesign.
[56] Yuankun Xue,et al. Reliable Multi-Fractal Characterization of Weighted Complex Networks: Algorithms and Implications , 2017, Scientific Reports.
[57] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[58] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[59] Partha Pratim Pande,et al. Design-Space Exploration and Optimization of an Energy-Efficient and Reliable 3-D Small-World Network-on-Chip , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[60] Radu Marculescu,et al. On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems , 2017, IEEE Transactions on Computers.
[61] Kalyanmoy Deb,et al. A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..
[62] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.