Effect of Loop Unrolling in Heterogeneous Multi-pipeline ASIPs

Introduction Embedded Systems are now becoming more ubiquitous, pervasive and touching virtually all aspects of daily life. From mobile telephones to automobiles, and industrial equipment to high end medical devices, embedded systems now form part of a wide range of devices. Along with non recurring engineering cost, power consumption, die size and performance are some of the main design challenges of embedded devices. Although, the embedded devices used in real time applications are expected to react fast in time, thus requiring high performance, the designers of such system should always keep an eye of the power consumption and cost of such design. Since embedded systems usually execute a single application or a small class of applications, customization of processors can be applied to optimize for performance, cost, power etc. One popular such design platform for embedded systems is the Application Specific Instructionset Processor (ASIP), which allows such customizability without overly hindering design flexibility. Numerous tools and design systems such as ASIP-meister and Xtensa have been developed for rapid ASIP generation. Usually ASIPs contain a single execution pipeline. Recently however, there has been trend towards having multiple pipelines [1, 7]. In [1], a design system was proposed for ASIPs with varying number of pipelines. Given an application specified in C, the design system generates a processor with a number of heterogeneous pipelines specifically suitable to that application. Each pipeline is customized, with a differing instruction set and the instructions are executed in parallel in all pipelines. Therefore, the numbers of cycles that take to execute a program will potentially go down compared to the single pipeline ASIP, improving the overall performance of the system. This paper describes a way of increasing the performance of an ASIP, called loop unrolling. Loop unrolling is a compiler technique that can be used to reduce the number of clock cycles, which has to be executed in a loop in a program [3, 4]. Even though, loop unrolling is a traditional technique in compiler optimizations, this is the first time it is attempted in a scheduling algorithm of a multi-pipeline ASIP design. The effect of loop unrolling on the performance of a heterogeneous multi-pipeline ASIP is reported in this paper.

[1]  Hui Guo,et al.  Customization of application specific heterogeneous multi-pipeline processors , 2006, Proceedings of the Design Automation & Test in Europe Conference.