论文信息 - COSMOS - 字舞流文

COSMOS

Hardware accelerators are key to the efficiency and performance of system-on-chip (SoC) architectures. With high-level synthesis (HLS), designers can easily obtain several performance-cost trade-off implementations for each component of a complex hardware accelerator. However, navigating this design space in search of the Pareto-optimal implementations at the system level is a hard optimization task. We present COSMOS, an automatic methodology for the design-space exploration (DSE) of complex accelerators, that coordinates both HLS and memory optimization tools in a compositional way. First, thanks to the co-design of datapath and memory, COSMOS produces a large set of Pareto-optimal implementations for each component of the accelerator. Then, COSMOS leverages compositional design techniques to quickly converge to the desired trade-off point between cost and performance at the system level. When applied to the system-level design (SLD) of an accelerator for wide-area motion imagery (WAMI), COSMOS explores the design space as completely as an exhaustive search, but it reduces the number of invocations to the HLS tool by up to 14.6×.

[1] Jason Cong,et al. Bandwidth optimization through on-chip memory restructuring for HLS , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[2] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[3] Luca P. Carloni,et al. System-level memory optimization for high-level synthesis of component-based SoCs , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[4] Christian Haubelt,et al. Accelerating design space exploration using pareto-front arithmetics , 2003, ASP-DAC '03.

[5] Zhen Fang,et al. Template-based memory access engine for accelerators in SoCs , 2011, 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011).

[6] Jason Cong,et al. Accelerator-rich architectures: Opportunities and progresses , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[7] Luca P. Carloni,et al. An analysis of accelerator coupling in heterogeneous architectures , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[8] Lok-Won Kim,et al. DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[9] Christian Haubelt,et al. Electronic System-Level Synthesis Methodologies , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10] Frank Ghenassia,et al. Transaction Level Modeling with SystemC , 2005 .

[11] Christian Haubelt,et al. Accelerating design space exploration using Pareto-front arithmetics [SoC design] , 2003, Proceedings of the ASP-DAC Asia and South Pacific Design Automation Conference, 2003..

[12] Luca P. Carloni,et al. Broadening the exploration of the accelerator design space in embedded scalable platforms , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[13] Alberto L. Sangiovanni-Vincentelli,et al. Quo Vadis, SLD? Reasoning About the Trends and Challenges of System Level Design , 2007, Proceedings of the IEEE.

[14] Benjamin Carrión Schäfer. Probabilistic Multiknob High-Level Synthesis Design Space Exploration Acceleration , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15] Jason Cong,et al. Combined loop transformation and hierarchy allocation for data reuse optimization , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[16] Daniele Loiacono,et al. A Multi-objective Genetic Algorithm for Design Space Exploration in High-Level Synthesis , 2008, 2008 IEEE Computer Society Annual Symposium on VLSI.

[17] Frank Ghenassia. Transaction-Level Modeling with SystemC: TLM Concepts and Applications for Embedded Systems , 2010 .

[18] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[19] Anirban Sengupta,et al. PSDSE: Particle Swarm Driven Design Space Exploration of Architecture and Unrolling Factors for Nested Loops in High Level Synthesis , 2014, 2014 Fifth International Symposium on Electronic System Design.

[20] Margaret Martonosi,et al. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21] Preeti Ranjan Panda,et al. The Impact of Loop Unrolling on Controller Delay in High Level Synthesis , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[22] Luca P. Carloni,et al. On learning-based methods for design-space exploration with High-Level Synthesis , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[23] Benjamin Carrion Schafer,et al. Machine-learning based simulated annealer method for high level synthesis design space exploration , 2014, Proceedings of the 2014 Electronic System Level Synthesis Conference (ESLsyn).

[24] Jason Cong,et al. An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[25] Luca P. Carloni,et al. From Latency-Insensitive Design to Communication-Based System-Level Design , 2015, Proceedings of the IEEE.

[26] Giovanni Chiola,et al. Properties and Performance Bounds for Timed Marked Graphs , 1992 .

[27] Tadao Murata,et al. Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[28] Luciano Lavagno,et al. High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze? , 2017, IEEE Access.

[29] Pedro C. Diniz,et al. A compiler approach to managing storage and memory bandwidth in configurable architectures , 2008, TODE.

[30] Gu-Yeon Wei,et al. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[31] Gu-Yeon Wei,et al. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[32] Kazutoshi Wakabayashi,et al. Machine learning predictive modelling high-level synthesis design space exploration , 2012, IET Comput. Digit. Tech..

[33] Luca P. Carloni,et al. Compositional system-level design exploration with planning of high-level synthesis , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[34] Jason Cong,et al. An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[35] Benjamin Carrion Schafer,et al. Adaptive Simulated Annealer for high level synthesis design space exploration , 2009, 2009 International Symposium on VLSI Design, Automation and Test.

[36] Andrew A. Chien,et al. The future of microprocessors , 2011, Commun. ACM.

[37] Luca P. Carloni,et al. System-Level Optimization of Accelerator Local Memory for Heterogeneous Systems-on-Chip , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[38] André Seznec. Bank-interleaved cache or memory indexing does not require euclidean division , 2015 .

[39] Dirk Stroobandt,et al. An overview of today’s high-level synthesis tools , 2012, Design Automation for Embedded Systems.

[40] Don R. Hush,et al. Wide-Area Motion Imagery , 2010, IEEE Signal Processing Magazine.

[41] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[42] C. V. Ramamoorthy,et al. Performance Evaluation of Asynchronous Concurrent Systems Using Petri Nets , 1980, IEEE Transactions on Software Engineering.

[43] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[44] Gu-Yeon Wei,et al. The accelerator store: A shared memory framework for accelerator-based systems , 2012, TACO.

[45] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[46] Yao Chen,et al. High Level Synthesis of Complex Applications: An H.264 Video Decoder , 2016, FPGA.

[47] Jason Cong,et al. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis , 2012, DAC Design Automation Conference 2012.

[48] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[49] Luca P. Carloni,et al. Supervised design space exploration by compositional approximation of Pareto sets , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[50] Luca P. Carloni,et al. Invited: The case for Embedded Scalable Platforms , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).