Automatic Analysis of Loops to Exploit Operator Parallelism on Reconfigurable Systems

With rapid advances in FPGA and other hardware technologies, architectures based on configurable computing engines, in which the Arithmetic Logic Unit (ALU) can be modified on-the-fly during computation, are becoming popular. Configurable architectures offer an opportunity for adapting the underlying hardware to the computation for efficiency. Typically, the need for configuration arises due to the fact that a given hardware ALU configuration is better suited for execution of a given algorithmic step. Since a program is an abstraction of a sequence of algorithmic steps, the need for such a reconfiguration (i.e., changing from one configuration to another), would thus, arise at different program points corresponding to these algorithmic steps. The problem of identifying the optimal configurations at different steps in a program is a very complex issue but allows the power of these architectures to be maximally used if solved. The success of these architectures critically depends on the effectiveness of the compiler and the research in this area is just beginning. The purpose of this paper is to specifically focus on an automatic compilation framework developed to effectively exploit operator parallelism.

[1]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[2]  Geoffrey Brown,et al.  A software development system for FPGA-based data acquisition systems , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[3]  Harvey F. Silverman,et al.  Processor reconfiguration through instruction-set metamorphosis , 1993, Computer.

[4]  Utpal Banerjee Loop Parallelization , 1994, Springer US.

[5]  AgarwalAnant,et al.  Baring It All to Software , 1997 .

[6]  P. Sadayappan,et al.  An approach to communication-efficient data redistribution , 1994, ICS '94.

[7]  P. Sadayappan,et al.  An Approach to Communication-eecient Data Redistribution , 1994 .

[8]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[9]  Brad L. Hutchings,et al.  Supporting FPGA microprocessors through retargetable software tools , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[10]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[11]  Guy E. Blelloch,et al.  Solving Linear Recurrences with Loop Raking , 1995, J. Parallel Distributed Comput..

[12]  Keshav Pingali,et al.  Solving Alignment Using Elementary Linear Algebra , 1994, LCPC.

[13]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[14]  Brad L. Hutchings,et al.  A dynamic instruction set computer , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[15]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[16]  Santosh Pande A compile time partitioning method for DOALL loops on distributed memory systems , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.