Automatic instruction-set architecture synthesis for VLIW processor cores in the ASAM project

Abstract The design of high-performance application-specific multi-core processor systems still is a time consuming task which involves many manual steps and decisions that need to be performed by experienced design engineers. The ASAM project sought to change this by proposing an automatic architecture synthesis and mapping flow aimed at the design of such application specific instruction-set processor (ASIP) systems. The ASAM flow separated the design problem into two cooperating exploration levels, known as the macro-level and micro-level exploration. This paper presents an overview of the micro-level exploration level, which is concerned with the analysis and design of individual processors within the overall multi-core design starting at the initial exploration stages but continuing up to the selection of the final design of the individual processors within the system. The designed processors use a combination of very-long instruction-word (VLIW), single-instruction multiple-data (SIMD), and complex custom DSP-like operations in order to provide an area- and energy-efficient and high-performance execution of the program parts assigned to the processor node. In this paper we present an overview of how the micro-level design space exploration interacts with the macro-level, how early performance estimates are used within the ASAM flow to determine the tasks executed by each processor node, and how an initial processor design is then proposed and refined into a highly specialized VLIW ASIP. The micro-level architecture exploration is then demonstrated with a walk-through description of the process on an example program kernel to further clarify the exploration and architecture specialization process. The main findings of the experimental research are that the presented method enables an automatic instruction-set architecture synthesis for VLIW ASIPs within a reasonable exploration time. Using the presented approach, we were able to automatically determine an initial architecture prototype that was able to meet the temporal performance requirements of the target application. Subsequently, refinement of this architecture considerably reduced both the design area (by 4x) and the active energy consumption (by 2x).

[1]  J. Ramanujam,et al.  Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.

[2]  Henk Corporaal,et al.  Coarse grained reconfigurable architectures in the past 25 years: Overview and classification , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).

[3]  Heinrich Meyr,et al.  Architecture implementation using the machine description language LISA , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.

[4]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[5]  Henk Corporaal,et al.  BuildMaster: Efficient ASIP architecture exploration through compilation and simulation result caching , 2014, 17th International Symposium on Design and Diagnostics of Electronic Circuits & Systems.

[6]  William J. Dally,et al.  Communication Scheduling , 2000, ASPLOS.

[7]  E. Rijpkema,et al.  Compaan: deriving process networks from Matlab for embedded signal processing architectures , 2000, Proceedings of the Eighth International Workshop on Hardware/Software Codesign. CODES 2000 (IEEE Cat. No.00TH8518).

[8]  Sven Verdoolaege,et al.  Polyhedral Extraction Tool , 2012 .

[9]  AS Andrei Terechko,et al.  Clustered VLIW architectures : a quantitative approach , 2007 .

[10]  Christian Lengauer,et al.  Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..

[11]  Henk Corporaal,et al.  Exploring processor parallelism: Estimation methods and optimization strategies , 2013, 2013 IEEE 16th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).

[12]  Deepak Gangadharan,et al.  Multi-ASIP platform synthesis for real-time applications , 2013, 2013 8th IEEE International Symposium on Industrial Embedded Systems (SIES).

[13]  Henk Corporaal,et al.  An Efficient Method for Energy Estimation of Application Specific Instruction-Set Processors , 2013, 2013 Euromicro Conference on Digital System Design.

[14]  Rodolfo Azevedo,et al.  The ArchC Architecture Description Language and Tools , 2005, International Journal of Parallel Programming.

[15]  Henk Corporaal,et al.  Automatic complex instruction identification for efficient application mapping onto ASIPs , 2014, 2014 IEEE 5th Latin American Symposium on Circuits and Systems.

[16]  Henk Corporaal,et al.  MOVE-Pro: A low power and high code density TTA architecture , 2011, 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[17]  Henk Corporaal,et al.  A framework for automatic custom instruction identification on multi-issue ASIPs , 2014, 2014 12th IEEE International Conference on Industrial Informatics (INDIN).

[18]  Mjg Marco Bekooij,et al.  Constraint driven operation assignment for retargetable VLIW compilers , 2004 .

[19]  Erwin Waterlander,et al.  AVISPA: a massively parallel reconfigurable accelerator , 2003, Proceedings. 2003 International Symposium on System-on-Chip (IEEE Cat. No.03EX748).

[20]  Nikil Dutt,et al.  Processor Description Languages , 2008 .

[21]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[22]  Tomás Vojnar,et al.  Automatic Formal Correspondence Checking of ISA and RTL Microprocessor Description , 2012, 2012 13th International Workshop on Microprocessor Test and Verification (MTV).

[23]  Gert Goossens,et al.  Design of ASIPs in multi-processor SoCs using the Chess/Checkers retargetable tool suite , 2006, International Symposium on System-on-Chip.

[24]  Henk Corporaal,et al.  Code generation for transport triggered architectures , 1994, Code Generation for Embedded Processors.

[25]  Henk Corporaal,et al.  Automatic Synthesis of Transport Triggered Processors , 1995 .

[26]  Aviral Shrivastava,et al.  Register File Power Reduction Using Bypass Sensitive Compiler , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27]  Henk Corporaal Transport Triggered Architectures : Design and Evaluation , 1995 .

[28]  Jakub Podivinsky,et al.  FPGA Prototyping and Accelerated Verification of ASIPs , 2015, 2015 IEEE 18th International Symposium on Design and Diagnostics of Electronic Circuits & Systems.

[29]  Lech Józwiak,et al.  Transformation-Based Exploration of Data Parallel Architecture for Customizable Hardware: A JPEG Encoder Case Study , 2012, 2012 15th Euromicro Conference on Digital System Design.

[30]  Henk Corporaal,et al.  Instruction-set architecture exploration of VLIW ASIPs using a genetic algorithm , 2014, 2014 3rd Mediterranean Conference on Embedded Computing (MECO).

[31]  Paolo Meloni,et al.  ASAM: Automatic architecture synthesis and application mapping , 2013, Microprocess. Microsystems.

[32]  L. Jozwiak,et al.  Static Power Reduction in Nano CMOS Circuits Through an Adequate Circuit Synthesis , 2007, 2007 14th International Conference on Mixed Design of Integrated Circuits and Systems.

[33]  Markus Freericks,et al.  Describing instruction set processors using nML , 1995, Proceedings the European Design and Test Conference. ED&TC 1995.

[34]  Heinrich Meyr,et al.  LISA—machine description language for cycle-accurate models of programmable DSP architectures , 1999, DAC '99.

[35]  Rainer Leupers,et al.  Language-driven Exploration and Implementation of Partially Re-configurable ASIPs , 2008 .

[36]  Nadia Nedjah,et al.  Hardware Reuse in Modern Application-Specific Processors and Accelerators , 2011, 2011 14th Euromicro Conference on Digital System Design.

[37]  Santiago González Pestana,et al.  An Integrated, Low-Power Processor for Image Signal Processing , 2006, Eighth IEEE International Symposium on Multimedia (ISM'06).

[38]  Kingshuk Karuri,et al.  A Design Flow for Architecture Exploration and Implementation of Partially Reconfigurable Processors , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[39]  B. Ramakrishna Rau,et al.  PICO: Automatically Designing Custom Computers , 2002, Computer.

[40]  Pierre Boulet,et al.  Array-OL Revisited, Multidimensional Intensive Signal Processing Specification , 2007 .

[41]  Lech Józwiak,et al.  Design space exploration in application-specific hardware synthesis for multiple communicating nested loops , 2012, 2012 International Conference on Embedded Computer Systems (SAMOS).

[42]  Laura Micconi A Probabilistic Approach for the System-Level Design of Multi-ASIP Platforms , 2015 .

[43]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[44]  Yifan He,et al.  Energy efficient special instruction support in an embedded processor with compact isa , 2012, CASES '12.

[45]  Philippe Coussy,et al.  High-Level Synthesis: from Algorithm to Digital Circuit , 2008 .

[46]  Henk Corporaal,et al.  Instruction-set architecture exploration strategies for deeply clustered VLIW ASIPs , 2013, 2013 2nd Mediterranean Conference on Embedded Computing (MECO).

[47]  Monica S. Lam,et al.  RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .