Data reorganization engines for the next generation of system-on-a-chip FPGAs

Field-Programmable-Core-Arrays (FPCA) will include various computing cores for a wide variety of applications ranging from DSP to general purpose computing. With the increasing gap between core computing speeds and memory access latency, managing and orchestrating the movement of data across multiple cores will become increasingly important. In this paper we propose data reorganization engines that allow a wide variety of data reorganizations intra- as well as inter-memory modules for future FPCAs. We have experimented with a suite of data reorganizations pervasive in DSP applications. Our limited set of experiments reveals that the proposed designs for these engines are flexile and use little design area in current FPGA fabrics, making them amenable to be easily integrated in future FPCAs as either soft- or hard- macros.

[1]  Pedro C. Diniz,et al.  Bridging the Gap between Compilation and Synthesis in the DEFACTO System , 2001, LCPC.

[2]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[3]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[4]  H. De Man,et al.  Global communication and memory optimizing transformations for low power signal processing systems , 1994, Proceedings of 1994 IEEE Workshop on VLSI Signal Processing.

[5]  Hugo De Man,et al.  Dataflow-driven memory allocation for multi-dimensional signal processing systems , 1994, ICCAD.

[6]  Erik Brockmeyer,et al.  Data and memory optimization techniques for embedded systems , 2001, TODE.

[7]  Pedro C. Diniz,et al.  Synthesis of pipelined memory access controllers for streamed data applications on FPGA-based computing engines , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[8]  Maya Gokhale,et al.  Automatic allocation of arrays to memories in FPGA processors with multiple memory banks , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[9]  Anthony Skjellum,et al.  Data Reorganization and Future Embedded HPC Middleware , 2000 .

[10]  Nikil D. Dutt,et al.  Exploiting off-chip memory access modes in high-level synthesis , 1997, 1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[11]  Pedro C. Diniz,et al.  Automatic synthesis of data storage and control structures for FPGA-based computing engines , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[12]  Hugo De Man,et al.  High-level address optimization and synthesis techniques for data-transfer-intensive applications , 1998, IEEE Trans. Very Large Scale Integr. Syst..