Exploiting Instruction Level Parallelism for REPLICA - A Configurable VLIW Architecture With Chained Functional Units

In this paper we present a scheduling algorithm for VLIW architectures with chained functional units. We show how our algorithm can help speed up programs at the instruction level, for an architecture called REPLICA, a configurable emulated shared memory (CESM) architecture whose computation model is based on the PRAM model. Since our LLVM based compiler is parameterizable in the number of different functional units, read and write ports to register file etc. we can generate code for different REPLICA architectures that have different functional unit configurations. We show for a set of different configurations how our implementation can produce high quality code; and we argue that the high parametrization of the compiler makes it, together with the simulator, useful for hardware/software co-design.

[1]  Mattias Eriksson,et al.  Integrated Code Generation , 2011 .

[2]  Ian Finlayson,et al.  An Overview of Static Pipelining , 2012, IEEE Computer Architecture Letters.

[3]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[4]  MARIO TOKORO,et al.  Optimization of Microprograms , 1981, IEEE Transactions on Computers.

[5]  M. Forsell Realizing Multioperations for Step Cached MP-SOCs , 2006, 2006 International Symposium on System-on-Chip.

[6]  Alexander Aiken,et al.  A Development Environment for Horizontal Microcode , 1986, IEEE Trans. Software Eng..

[7]  Daniel Åkesson An LLVM Back-end for REPLICA : Code Generation for a Multi-core VLIWProcessor with Chaining , 2012 .

[8]  Christoph W. Kessler,et al.  Compiling for VLIW DSPs , 2018, Handbook of Signal Processing Systems.

[9]  Graham Wood,et al.  Global optimization of microprograms through modular control constructs , 1979, MICRO 12.

[10]  Ronald L. Graham,et al.  Bounds for certain multiprocessing anomalies , 1966 .

[11]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  Christoph W. Kessler,et al.  Practical PRAM programming , 2000, Wiley series on parallel and distributed computing.

[13]  Christoph W. Kessler,et al.  Integrated Modulo Scheduling for Clustered VLIW Architectures , 2009, HiPEAC.